R语言 在每个包含模式的字符串之前插入向量中的字符串

c0vxltue  于 2023-04-03  发布在  其他
关注(0)|答案(4)|浏览(167)

我有以下向量:

test <- c("here is some text", "here is some other text", "here is my formula", "2+2", "here is my second formula", "4+4", "here is even more text", "here is my final formula", "6+6")

我希望做的是获取包含“formula”的字符串的每个示例,并在它前面插入一个随机字符串,如“CCC”,这样我就有了如下内容:

test <- c("here is some text", "here is some other text", "CCC", "here is my formula", "2+2", "CCC", "here is my second formula", "4+4", "here is even more text", "CCC", "here is my final formula", "6+6")
b0zn9rqh

b0zn9rqh1#

这些仅使用碱基R:

**1)**使用append,如图所示:

test2 <- test
ix <- rev(grep("formula", test)) - 1
for(i in ix) test2 <- append(test2, "CCC", i)
test2
##  [1] "here is some text"         "here is some other text"  
##  [3] "CCC"                       "here is my formula"       
##  [5] "2+2"                       "CCC"                      
##  [7] "here is my second formula" "4+4"                      
##  [9] "here is even more text"    "CCC"                      
## [11] "here is my final formula"  "6+6"

**2)**这里有三个不同的一行程序。

第一个创建一个矩阵,其第一行包含“CCC”和NA元素,第二行为test
第二个迭代测试,如果元素中不包含公式,则输出元素或元素后面的向量“CCC”。这会产生一个未列出的列表。
第三种方法在任何包含公式的元素前面加上“CCC\n”,然后将其拆分。

# 2a
c(na.omit(c(rbind(ifelse(grepl("formula", test), "CCC", NA), test))))

# 2b
unlist(lapply(test, function(x) if (grepl("formula", x)) c("CCC", x) else x))

# 2c
scan(text = sub("(.*formula)", "CCC\n\\1", test), what="", quiet=TRUE, sep="\n")
nukf8bse

nukf8bse2#

下面是tidyverse中的一个选项

library(dplyr)
library(stringr)
library(tidyr)
tibble(test) %>% 
 uncount(str_detect(test, 'formula') + 1) %>%
  mutate(test = replace(test, duplicated(test, fromLast = TRUE), "CCC"))
  • 输出
# A tibble: 12 × 1
   test                     
   <chr>                    
 1 here is some text        
 2 here is some other text  
 3 CCC                      
 4 here is my formula       
 5 2+2                      
 6 CCC                      
 7 here is my second formula
 8 4+4                      
 9 here is even more text   
10 CCC                      
11 here is my final formula 
12 6+6
8ftvxx2r

8ftvxx2r3#

更新:改进代码:

library(tidyverse)

test %>%
  tibble(value = .) %>%
  mutate(index = row_number(), new_val = ifelse(str_detect(value, "formula"), "CCC", NA_character_)) %>%
  pivot_longer(c(new_val, value), values_drop_na = TRUE) %>%
  pull(value)

下面是另一种方法:我已经提到过了:对我来说,在这种情况下,用 Dataframe 或tibble来思考要容易得多:

library(tidyverse)

test %>%
  as_tibble() %>% 
  mutate(index = row_number()) %>% 
  mutate(new_val = ifelse(str_detect(test, "formula"), "CCC", NA_character_)) %>% 
  pivot_longer(c(new_val, value)) %>% 
  drop_na() %>% 
  pull(value)

 [1] "here is some text"         "here is some other text"  
 [3] "CCC"                       "here is my formula"       
 [5] "2+2"                       "CCC"                      
 [7] "here is my second formula" "4+4"                      
 [9] "here is even more text"    "CCC"                      
[11] "here is my final formula"  "6+6"
f0brbegy

f0brbegy4#

下面是一个基本的R方法,使用grepl/cumsum创建一个分组向量,每个组从"formula"字符串开始。然后by将在其位置插入"CCC"

test <- c("here is some text", "here is some other text",
          "here is my formula", "2+2", "here is my second formula",
          "4+4", "here is even more text", 
          "here is my final formula", "6+6")

j <- cumsum(grepl("formula", test))
test <- unname(unlist(by(test, j, FUN = \(x) c("CCC", x))))
if(j[1L] == 0) test <- test[-1L]
test
#>  [1] "here is some text"         "here is some other text"  
#>  [3] "CCC"                       "here is my formula"       
#>  [5] "2+2"                       "CCC"                      
#>  [7] "here is my second formula" "4+4"                      
#>  [9] "here is even more text"    "CCC"                      
#> [11] "here is my final formula"  "6+6"

创建于2023年3月30日,使用reprex v2.0.2

相关问题