R如何创建汇总字符串检测总数的新列

sycxhyv7 于 2023-01-06 发布在其他

关注(0)|答案(4)|浏览(136)

我在R中有一个 Dataframe ，看起来像这样：

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
)), row.names = c(NA, -3L), class = "data.frame")

我想为“items”列中的每个项目创建新列，并计算每个项目的频率。例如，我想创建一个“Apple”列，其中包含“items”列中“Apple”的频率，创建一个“Pear”列，其中包含“items”列中“Pear”的频率，等等。
最终 Dataframe 应如下所示：

structure(list(items = c("Apple", "Apple, Pear", "Apple, Pear, Banana"
), Apple = c(3, 3, 3), Pear = c(2, 2, 2), Banana = c(1, 1, 1)), row.names = c(NA, 
-3L), class = "data.frame")

我试过使用dplyr和stringr包中的mutate（）和str_count（）函数，但是我不确定如何得到我想要的最终 Dataframe 。
下面是我目前为止尝试过的代码：

items %>%
  mutate(Apple = str_count(items, "Apple"),
         Pear = str_count(items, "Pear"),
         Banana = str_count(items, "Banana"))

这样我就完成了一部分，但是我不知道如何为每个项目创建一个新列，并计算每个项目的频率，有人能帮我弄清楚如何在R中做到这一点吗？

来源：https://stackoverflow.com/questions/75008965/r-how-to-create-a-new-column-that-summarizes-the-total-of-a-string-detect

4条答案

按热度按时间

83qze16e1#

您可以将str_count Package 为sum：

items %>%
  mutate(Apple = sum(str_count(items, "Apple")),
         Pear = sum(str_count(items, "Pear")),
         Banana = sum(str_count(items, "Banana")))

                items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1

赞(0）回复(0）举报 2023-01-06

ccgok5k52#

特别是在您有多行和多个值的情况下-〉以下是一个解决方案，使用单独的行数并与cbind结合，最后通过填充NA进行透视：

library(dplyr)
library(tidyr)
df %>% 
  separate_rows(items, sep='\\,') %>% 
  count(items1 = trimws(items)) %>% 
  cbind(df) %>% 
  pivot_wider(names_from = items1, values_from = n) %>% 
  fill(-items, .direction = "downup")

items               Apple Banana  Pear
  <chr>               <int>  <int> <int>
1 Apple                   3      1     2
2 Apple, Pear             3      1     2
3 Apple, Pear, Banana     3      1     2

赞(0）回复(0）举报 2023-01-06

crcmnpdw3#

在感兴趣的单词上使用map-loop，并使用transmute返回一列，其中包含items列中单词的计数，并将输出绑定到原始数据

library(purrr)
library(dplyr)
 map_dfc(c("Apple", "Pear", "Banana"), ~ df1 %>%
    transmute(!! .x := sum(str_count(items, .x)))) %>%
    bind_cols(df1, .)

输出

items Apple Pear Banana
1               Apple     3    2      1
2         Apple, Pear     3    2      1
3 Apple, Pear, Banana     3    2      1

或者，另一种选择是拆分列"items"，使用mtabulate，并在获得colSums后对列进行cbind

library(qdapTools)
cbind(df1, as.list(colSums(mtabulate(strsplit(df1$items, ",\\s*")))))
                items Apple Banana Pear
1               Apple     3      1    2
2         Apple, Pear     3      1    2
3 Apple, Pear, Banana     3      1    2

赞(0）回复(0）举报 2023-01-06

dtcbnfnu4#

您可以尝试以下操作，

library(tidyverse)

df <- structure(list(items = c(
    "Apple", "Apple, Pear", "Apple, Pear, Banana"
  )),
  row.names = c(NA,-3L),
  class = "data.frame")

total_count <- function(x, word) {
  paste0(x, collapse = ", ") %>% 
    stringr::str_count(word)
}
  
df %>%
  mutate(Apple = total_count(items, "Apple"),
         Pear = total_count(items, "Pear"),
         Banana = total_count(items, "Banana"))

#>                 items Apple Pear Banana
#> 1               Apple     3    2      1
#> 2         Apple, Pear     3    2      1
#> 3 Apple, Pear, Banana     3    2      1

创建于2023年1月4日，使用reprex v2.0.2

赞(0）回复(0）举报 2023-01-06

我来回答

R如何创建汇总字符串检测总数的新列

4条答案

相关问题

热门标签

最新问答