R语言 替换多个字符串中的多个单词

mlmc2os5  于 2023-10-13  发布在  其他
关注(0)|答案(4)|浏览(121)

我想在一个矢量中替换单词,这个矢量是基于另一个矢量中的原始单词和替换单词。例如:
要修改的字符串的向量:

my_words <- c("example r", "example River", "example R", "anthoer river",
        "now a creek", "and another Ck", "example river tributary")

要替换的单词和相应的替换单词的一个框架:

my_replace <- data.frame(
  original = c("r", "River", "R", "river", "Ck", "creek", "Creek"),
  replacement = c("R", "R", "R", 'R', "C", "C", "C"))

我想用向量my_wordsmy_replace$replacement中对应的值替换my_replace$original中出现的任何一个词。我尝试使用stringr::str_replace_all(),但它替换了字母/单词的所有示例,而不仅仅是整个单词(例如,“另一个”变成了“另一个”),这是不可取的。
我想做的事情的伪代码:

str_replace_all(my_words, my_replace$original, my_replace$replacement)

所需输出:

"example R", "example R", "example R", "another R", "now a C", "and another C", "example R tributary"

我确实找到了一个使用for循环的解决方案,但是考虑到我的数据集很大,for循环选项太慢了。任何建议非常感谢。

rqdpfwrv

rqdpfwrv1#

下面是一种sub方法,它只进行一次替换:

my_words <- c("example r", "example River", "example R", "anthoer river",
    "now a creek", "and another Ck", "example river tributary")

output <- gsub("\\b([rR])(?:iver)?\\b|\\b([cC])(?:ree)?k\\b", "\\U\\1\\U\\2", my_words, perl=TRUE)
output

[1] "example R"           "example R"           "example R"
[4] "anthoer R"           "now a C"             "and another C"
[7] "example R tributary"

由于所有river和creek事件的替换分别为RC,因此我们可以捕获每个可能匹配项的第一个字母,然后使用这些字母的第二个版本进行替换。

u4vypkhs

u4vypkhs2#

您需要从my_words$original中的单词构建一个基于动态单词边界的模式,然后使用stringr::str_replace_all替换为相应的值。请注意,original短语需要按长度降序排序,以使较长的字符串首先匹配:

my_words <- c("example r", "example River", "example R", "anthoer river", "now a creek", "and another Ck", "example river tributary")
my_replace <- data.frame(original = c("r", "River", "R", "river", "Ck", "creek", "Creek"), replacement = c("R", "R", "R", 'R', "C", "C", "C"))
sort.by.length.desc <- function (v) v[order( -nchar(v)) ]
library(stringr)
regex <- paste0("\\b(",paste(sort.by.length.desc(my_replace$original), collapse="|"), ")\\b")
str_replace_all(my_words, regex, function(word) my_replace$replacement[my_replace$original==word][[1]][1])

输出量:

[1] "example R"           "example R"           "example R"           "anthoer R"           "now a C"             "and another C"       "example R tributary"

正则表达式将是\b(River|river|creek|Creek|Ck|r|R)\b,它匹配作为一个完整单词的任何单词。

mrphzbgm

mrphzbgm3#

使用rflashtext库可以非常简单地完成此任务:

my_words <- c("example r", "example River", "example R", "anthoer river",
              "now a creek", "and another Ck", "example river tributary")

my_replace <- data.frame(
  original = c("r", "River", "R", "river", "Ck", "creek", "Creek"),
  replacement = c("R", "R", "R", 'R', "C", "C", "C"))

library(rflashtext)

processor <- KeywordProcessor$new(keys = my_replace$original,
                                  words = my_replace$replacement)

processor$replace_keys(my_words)

[1] "example R"           "example R"           "example R"           "anthoer R"          
[5] "now a C"             "and another C"       "example R tributary"
qhhrdooz

qhhrdooz4#

library(stringi)

stri_replace_all_regex(my_words, "\\b" %s+% my_replace$original %s+% "\\b", my_replace$replacement, vectorize_all = FALSE)

[1] "example R" "example R" "example R" "anthoer R" "now a C" "and another C" "example R tributary"

相关问题