计算R dplyr中一个 Dataframe 中的字符串出现在另一个 Dataframe 中的次数

pprl5pva 于 2023-06-03 发布在其他

关注(0)|答案(6)|浏览(147)

我有两个 Dataframe ，看起来像这样：

df1 <- data.frame(reference=c("cat","dog"))
print(df1)
#>   reference
#> 1       cat
#> 2       dog
df2 <- data.frame(data=c("cat","car","catt","cart","dog","dog","pitbull"))
print(df2)
#>      data
#> 1     cat
#> 2     car
#> 3    catt
#> 4    cart
#> 5     dog
#> 6     dog
#> 7 pitbull

创建于2021-12-29由reprex package（v2.0.1）
我想找出df 1中的单词cat和dog在df 2中出现的次数。我希望我的数据看起来像这样

animals   n
cat       1
dog       2

任何帮助或指导都是值得赞赏的。我的参考名单是巨大的。我试着把它们每一个都写下来，但这需要时间。
谢谢你的时间。节日快乐

来源：https://stackoverflow.com/questions/70524517/count-how-many-times-strings-from-one-data-frame-appear-to-another-data-frame-in

6条答案

按热度按时间

ycl3bljg1#

更新：感谢Gregor托马斯：

library(dplyr)

left_join(df1,df2, by=c("reference"="data")) %>% 
  count(reference)

输出：

reference n
1       cat 1
2       dog 2

我们可以使用semi_join和count：

library(dplyr)

semi_join(df2,df1, by=c("data"="reference")) %>% 
  count(data)

data n
1  cat 1
2  dog 2

赞(0）回复(0）举报 2023-06-03

vd2z7a6w2#

基于tidyverse的可能解决方案：

library(tidyverse)

df1 <- data.frame(reference=c("cat","dog"))
df2 <- data.frame(data=c("cat","car","catt","cart","dog","dog","pitbull"))

df1 %>% 
  group_by(animal = reference) %>% 
  summarise(n = sum(reference == df2$data), .groups = "drop")

#> # A tibble: 2 × 2
#>   animal     n
#>   <chr>  <int>
#> 1 cat        1
#> 2 dog        2

赞(0）回复(0）举报 2023-06-03

kkbh8khc3#

使用连接可能会更快

library(data.table)
setDT(df2)[, .(animals = data)][df1, .(n = .N), 
     on = .(animals = reference), by = .EACHI]
   animals n
1:     cat 1
2:     dog 2

或者在subset执行base R中的数据后使用table

table(subset(df2, data %in% df1$reference, select = data))

赞(0）回复(0）举报 2023-06-03

neekobn84#

这是第三种选择：

library(tidyverse)

df1 <- tibble(reference=c("cat","dog"))
df2 <- tibble(data=c("cat","car","catt","cart","dog","dog","pitbull"))

df2 |>
  count(data) |>
  filter(data %in% df1$reference) |>
  rename(animal = data)
#> # A tibble: 2 x 2
#>   animal     n
#>   <chr>  <int>
#> 1 cat        1
#> 2 dog        2

赞(0）回复(0）举报 2023-06-03

vu8f3i0k5#

我们可以使用str_count将第二个df中的列折叠成一个字符串。

library(tidyverse)

df1 %>%
  transmute(animals = reference, n = str_c(df2$data, collapse = " ") %>%
    str_count(str_c("\\b", reference, "\\b")) )
#>   animals n
#> 1     cat 1
#> 2     dog 2

创建于2021-12-29由reprex package（v2.0.1）

赞(0）回复(0）举报 2023-06-03

zlhcx6iw6#

df1$n <- colSums(outer(df2$data, df1$reference, '=='))

df1
#>   reference n
#> 1       cat 1
#> 2       dog 2

赞(0）回复(0）举报 2023-06-03

我来回答

计算R dplyr中一个 Dataframe 中的字符串出现在另一个 Dataframe 中的次数

6条答案

相关问题

热门标签

最新问答