R如何创建汇总字符串检测总数的新列

sycxhyv7  于 2023-01-06  发布在  其他
关注(0)|答案(4)|浏览(136)

我在R中有一个 Dataframe ,看起来像这样:

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
)), row.names = c(NA, -3L), class = "data.frame")

我想为“items”列中的每个项目创建新列,并计算每个项目的频率。例如,我想创建一个“Apple”列,其中包含“items”列中“Apple”的频率,创建一个“Pear”列,其中包含“items”列中“Pear”的频率,等等。
最终 Dataframe 应如下所示:

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
), Apple = c(3, 3, 3), Pear = c(2, 2, 2), Banana = c(1, 1, 1)), row.names = c(NA, 
-3L), class = "data.frame")

我试过使用dplyr和stringr包中的mutate()和str_count()函数,但是我不确定如何得到我想要的最终 Dataframe 。
下面是我目前为止尝试过的代码:

items %>%
  mutate(Apple = str_count(items, "Apple"),
         Pear = str_count(items, "Pear"),
         Banana = str_count(items, "Banana"))

这样我就完成了一部分,但是我不知道如何为每个项目创建一个新列,并计算每个项目的频率,有人能帮我弄清楚如何在R中做到这一点吗?

83qze16e

83qze16e1#

您可以将str_count Package 为sum

items %>%
  mutate(Apple = sum(str_count(items, "Apple")),
         Pear = sum(str_count(items, "Pear")),
         Banana = sum(str_count(items, "Banana")))

                items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1
ccgok5k5

ccgok5k52#

特别是在您有多行和多个值的情况下-〉以下是一个解决方案,使用单独的行数并与cbind结合,最后通过填充NA进行透视:

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(items, sep='\\,') %>% 
  count(items1 = trimws(items)) %>% 
  cbind(df) %>% 
  pivot_wider(names_from = items1, values_from = n) %>% 
  fill(-items, .direction = "downup")
items               Apple Banana  Pear
  <chr>               <int>  <int> <int>
1 Apple                   3      1     2
2 Apple, Pear             3      1     2
3 Apple, Pear, Banana     3      1     2
crcmnpdw

crcmnpdw3#

在感兴趣的单词上使用map-loop,并使用transmute返回一列,其中包含items列中单词的计数,并将输出绑定到原始数据

library(purrr)
library(dplyr)
 map_dfc(c("Apple", "Pear", "Banana"), ~ df1 %>%
    transmute(!! .x := sum(str_count(items, .x)))) %>%
    bind_cols(df1, .)
  • 输出
items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1

或者,另一种选择是拆分列"items",使用mtabulate,并在获得colSums后对列进行cbind

library(qdapTools)
cbind(df1, as.list(colSums(mtabulate(strsplit(df1$items, ",\\s*")))))
                items Apple Banana Pear
1               Apple     3      1    2
2         Apple, Pear     3      1    2
3 Apple, Pear, Banana     3      1    2
dtcbnfnu

dtcbnfnu4#

您可以尝试以下操作,

library(tidyverse)

df <- structure(list(items = c(
    "Apple", "Apple, Pear", "Apple, Pear, Banana"
  )),
  row.names = c(NA,-3L),
  class = "data.frame")

total_count <- function(x, word) {
  paste0(x, collapse = ", ") %>% 
    stringr::str_count(word)
}
  
df %>%
  mutate(Apple = total_count(items, "Apple"),
         Pear = total_count(items, "Pear"),
         Banana = total_count(items, "Banana"))

#>                 items Apple Pear Banana
#> 1               Apple     3    2      1
#> 2         Apple, Pear     3    2      1
#> 3 Apple, Pear, Banana     3    2      1

创建于2023年1月4日,使用reprex v2.0.2

相关问题