R语言获取一个字符串中匹配的字符索引并应用于另一个字符串

nbysray5 于 2023-05-26 发布在其他

关注(0)|答案(2)|浏览(111)

我有下面的dataframe，其中每一行代表文本的变化。然后，我使用adist()函数提取更改是匹配（M）、插入（I）、替换（S）还是删除（D）。
我需要在change列中找到I s的所有索引（在insrtion_idx列中显示）。使用这些索引，我需要提取current_text中的相应字符（在这里以insertion_chars为例）。

df <- tibble(current_text = c("A","AB","ABCD","ABZ"),
             previous_text = c("","A","AB","ABCD"),
             change = c("I","MI","MMII","MMSD"),
             insertion_idx = c(c(1),c(2),c(3,4),""),
             insertion_chars = c("A","B","CD",""))

我尝试过拆分字符串并比较字符串的差异，但对于真实世界的数据，这会变得非常混乱。如何完成上述任务？

来源：https://stackoverflow.com/questions/76327866/get-character-indices-match-in-one-string-and-apply-to-another-string

2条答案

按热度按时间

p3rjfoxz1#

把我关于使用gregexpr和regmatches的评论变成一个答案。
如果您正在寻找替代方法，此过程中的许多内容与此问题中的内容非常相似-Extract a regular expression match。

df <- data.frame(current_text = c("A","AB","ABCD","ABZ"),
             previous_text = c("","A","AB","ABCD"),
             change = c("I","MI","MMII","MMSD"))

df$insertion_idx <- gregexpr("I", df$change)
df$insertion_chars <- sapply(regmatches(df$current_text, df$insertion_idx), 
                             paste, collapse="")
df
##  current_text previous_text change insertion_chars insertion_idx
##1            A                    I               A             1
##2           AB             A     MI               B             2
##3         ABCD            AB   MMII              CD          3, 4
##4          ABZ          ABCD   MMSD                            -1

赞(0）回复(0）举报 2023-05-26

u2nhd7ah2#

尝试以下替代thelatemail的（优秀的）推荐（同样有效）：

quux <- structure(list(current_text = c("A", "AB", "ABCD", "ABZ"), previous_text = c("", "A", "AB", "ABCD"), change = c("I", "MI", "MMII", "MMSD")), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))

quux$insertion_idx <- lapply(strsplit(quux$change, ""), function(z) which(z == "I"))
quux$insertion_chars <- mapply(function(ctxt, idx) {
  if (length(idx)) paste(substring(ctxt, idx, idx), collapse = "") else ""
}, quux$current_text, quux$insertion_idx)
quux
# # A tibble: 4 × 5
#   current_text previous_text change insertion_idx insertion_chars
#   <chr>        <chr>         <chr>  <list>        <chr>          
# 1 A            ""            I      <int [1]>     "A"            
# 2 AB           "A"           MI     <int [1]>     "B"            
# 3 ABCD         "AB"          MMII   <int [2]>     "CD"           
# 4 ABZ          "ABCD"        MMSD   <int [0]>     ""

请注意，insertion_idx是一个列表列，其中包含您要查找的索引：

str(quux)
# tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
#  $ current_text   : chr [1:4] "A" "AB" "ABCD" "ABZ"
#  $ previous_text  : chr [1:4] "" "A" "AB" "ABCD"
#  $ change         : chr [1:4] "I" "MI" "MMII" "MMSD"
#  $ insertion_idx  :List of 4
#   ..$ : int 1
#   ..$ : int 2
#   ..$ : int [1:2] 3 4
#   ..$ : int(0) 
#  $ insertion_chars: Named chr [1:4] "A" "B" "CD" ""
#   ..- attr(*, "names")= chr [1:4] "A" "AB" "ABCD" "ABZ"

赞(0）回复(0）举报 2023-05-26

我来回答

R语言获取一个字符串中匹配的字符索引并应用于另一个字符串

2条答案

相关问题

热门标签

最新问答

R语言 获取一个字符串中匹配的字符索引并应用于另一个字符串

2条答案

相关问题

热门标签

最新问答

R语言获取一个字符串中匹配的字符索引并应用于另一个字符串