R语言 合并两个数据集并对共享列求和[重复]

mitkmikd  于 2023-05-20  发布在  其他
关注(0)|答案(5)|浏览(131)

此问题已在此处有答案

How to merge two data frames on common columns in R with sum of others?(3个答案)
4天前关闭。
我有一个类似的情况:
我有两个数据集,这一个:
| 从|到|频率|
| --------------|--------------|--------------|
| a| a| 2|
| a| B| 3|
还有这个
| 从|到|频率|
| --------------|--------------|--------------|
| a| a| 3|
| a| B|四|
现在,我想合并这两个变量,保持“from”和“to”变量相同,因为它们完全相同,但同时对频率求和。
这是我们应该得到的:
| 从|到|频率|
| --------------|--------------|--------------|
| a| a| 5|
| a| B|七|

dfddblmv

dfddblmv1#

对于基数R,我们可以使用aggregate + rbind

> aggregate(frequency ~ ., rbind(df1, df2), sum)
  from to frequency
1    a  a         5
2    a  b         7

或者,我们可以使用xtabs + as.data.frame + rbind

> as.data.frame(xtabs(frequency ~ ., rbind(df1, df2)))
  from to Freq
1    a  a    5
2    a  b    7

数据

> dput(df1)
structure(list(from = c("a", "a"), to = c("a", "b"), frequency = 2:3), class = "data.frame", row.names = c(NA,
-2L))

> dput(df2)
structure(list(from = c("a", "a"), to = c("a", "b"), frequency = 3:4), class = "data.frame", row.names = c(NA,
-2L))
5us2dqdw

5us2dqdw2#

一个很酷的(dplyr)选项是使用powerjoin包:

library(powerjoin)
power_left_join(df1, df2, by = c("from", "to"), conflict = `+`)

结果

from to frequency
1:    a  a         5
2:    a  b         7
qxgroojn

qxgroojn3#

df1 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(2, 3))

df2 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(3, 4))

# Merge and sum frequencies
merged_df <- bind_rows(df1, df2) %>%
  group_by(from, to) %>%
  summarise(frequency = sum(frequency))

print(merged_df)

from  to    frequency
  <chr> <chr>     <dbl>
1 a     a             5
2 a     b             7
syqv5f0l

syqv5f0l4#

可以使用data.table

library(data.table)
df1 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(2, 3)) |> setDT()

df2 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(3, 4)) |> setDT()

df1[df2, on = .(from, to), 
    frequency := x.frequency + i.frequency]

输出

from     to frequency
   <char> <char>     <int>
1:      a      a         5
2:      a      b         7
1sbrub3j

1sbrub3j5#

使用dplyr,我们可以使用两个步骤

df1 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(2, 3))
df2 <- data.frame(from = c("a", "a"), to = c("a", "b"), frequency = c(3, 4))

inner_join(df1, df2, by = c("to","from")) %>% 
  mutate(frequency = frequency.x + frequency.y)

相关问题