如何使用grepl检查一个字符串在R中是否按行存在

plicqrtu  于 2022-12-15  发布在  其他
关注(0)|答案(1)|浏览(153)
library(dplyr)
mydata <- data.frame(id = c(1, 2, 3, 4, 5),
                     fruit = c("orange", "jackfruit", "", "N/A", ""),
                     fruit2 = c("orange", "guava", "", "", ""),
                     fruit3 = c("orange", "N/A", "orang", "", ""))

> mydata
  id     fruit    fruit2 fruit3
1  1    orange    orange orange
2  2 jackfruit     guava    N/A
3  3                     orange
4  4       N/A                 
5  5

我有一个数据集,我想检查每个ID是否存在字符串。例如,字符串“橙子”存在于ids = 1, 2中,字符串“jackfruit”存在于id=2中,等等。
下面是我的尝试,但出现错误:

> mydata %>% group_by(id) %>% grepl("orange")
Warning message:
In grep(., "orange") :
  argument 'pattern' has length > 1 and only the first element will be used
vawmfj5a

vawmfj5a1#

我不知道你是如何期待作为输出,但我想到了两种方法来做到这一点。

通用

代码

mydata %>% 
  pivot_longer(cols = -id,) %>% 
  filter(value != "", value != "N/A") %>%
  mutate(aux = TRUE) %>%
  select(-name) %>% 
  unique() %>% 
  pivot_wider(names_from = value,values_from = aux,values_fill = FALSE)

输出

# A tibble: 4 x 5
     id orange jackfruit guava banana
  <dbl> <lgl>  <lgl>     <lgl> <lgl> 
1     1 TRUE   FALSE     FALSE FALSE 
2     2 FALSE  TRUE      TRUE  FALSE 
3     3 FALSE  FALSE     FALSE TRUE  
4     5 FALSE  TRUE      FALSE TRUE

手动

代码

library(stringr)

mydata %>% 
  rowwise() %>% 
  mutate(
    orange = if_else(any(str_detect(c_across(cols = starts_with("fruit")),"orange")),TRUE,FALSE),
    guava = if_else(any(str_detect(c_across(cols = starts_with("fruit")),"guava")),TRUE,FALSE),
    banana = if_else(any(str_detect(c_across(cols = starts_with("fruit")),"banana")),TRUE,FALSE),
    jackfruit = if_else(any(str_detect(c_across(cols = starts_with("fruit")),"jackfruit")),TRUE,FALSE)
  )

输出

# A tibble: 5 x 8
# Rowwise: 
     id fruit       fruit2      fruit3   orange guava banana jackfruit
  <dbl> <chr>       <chr>       <chr>    <lgl>  <lgl> <lgl>  <lgl>    
1     1 "orange"    "orange"    "orange" TRUE   FALSE FALSE  FALSE    
2     2 "jackfruit" "guava"     "N/A"    FALSE  TRUE  FALSE  TRUE     
3     3 ""          ""          "banana" FALSE  FALSE TRUE   FALSE    
4     4 "N/A"       ""          ""       FALSE  FALSE FALSE  FALSE    
5     5 ""          "jackfruit" "banana" FALSE  FALSE TRUE   TRUE

相关问题