R语言 用重复的列和相应的日期合并多个框并填充空白

92vpleto  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(138)

我有三个数组(df1,df2和df3)。我想合并它们,并填补彼此的空白。例如:df1包含1990年至1993年的美国数据,df2包含1994年至1999年的数据。当我使用reduce函数合并它们时,我得到重复的数据(USA.x,USA.y),而不仅仅是美国,所有的日期和值都是连续的。
我是这样做的:

df1 <- data.frame(
  stringsAsFactors = FALSE, 
  Date = c("01/01/1990", "01/01/1991", 
           "01/01/1992", "01/01/1993", "01/01/1994", "01/01/1995"), 
  USA = c(1, 4, 2, 1, NA, NA), 
  FRANCE = c(4, 4, 2, 5, NA, NA), 
  ITALY = c(1, 4, 5, 2, NA, NA))

df2 <-data.frame(
  stringsAsFactors = FALSE, 
  Date = c("01/01/1994", "01/01/1995", 
           "01/01/1996", "01/01/1997", "01/01/1998", "01/01/1999"), 
  USA = c(3, 3, 1, 4, 3, 1), 
  FRANCE = c(2, 5, 2, 5, 5, 1), 
  MEXICO = c(4, 1, 4, 3, NA, NA))

df3 <- data.frame(
  stringsAsFactors = FALSE, 
  Date = c("01/01/1998", "01/01/1999", 
           "01/01/2000", "01/01/2001", "01/01/2002", "01/01/2003"), 
  MEXICO = c(3, 3, 5, 4, 2, 3), 
  BELGIUM = c(4, 2, 1, 4, 5, 1))

df_list <- list(df1, df2, df3)

字符串

  • 备选方案1*
dfall1 <- Reduce(function(x, y) merge(x, y, all=TRUE), df_list, accumulate=FALSE)
View(dfall1)

  • 结果dfall1*

它根据重复的名称集成列,但它重复的日期。每个国家只有一列,但同一日期有2行:一行是NA,另一行是来自另一个数据框的值。

  • 备选案文2*
dfall2 <- Reduce(function(x, y) merge(x, y, all=TRUE, by = "Date"), df_list, accumulate=FALSE)
View(dfall2)

  • 结果dfall2*

日期行不重复列名将被重命名为.x和. y。

  • 问题 * 如何避免重复的行和列,我希望所有的数据都能根据它们的列名和对应的日期进行整合
    最后我想说的是:


的数据
任何帮助都将不胜感激。先谢了。

hgqdbh6s

hgqdbh6s1#

整形,例如使用reshape2包。首先,整形为长格式,然后是rbindna.omit,最后整形为宽格式。

> lapply(df_list, reshape2:::melt.data.frame, id.vars='Date') |> 
+   do.call(what='rbind') |> na.omit() |> reshape2::dcast(Date ~ variable) 
         Date USA FRANCE ITALY MEXICO BELGIUM
1  01/01/1990   1      4     1     NA      NA
2  01/01/1991   4      4     4     NA      NA
3  01/01/1992   2      2     5     NA      NA
4  01/01/1993   1      5     2     NA      NA
5  01/01/1994   3      2    NA      4      NA
6  01/01/1995   3      5    NA      1      NA
7  01/01/1996   1      2    NA      4      NA
8  01/01/1997   4      5    NA      3      NA
9  01/01/1998   3      5    NA      3       4
10 01/01/1999   1      1    NA      3       2
11 01/01/2000  NA     NA    NA      5       1
12 01/01/2001  NA     NA    NA      4       4
13 01/01/2002  NA     NA    NA      2       5
14 01/01/2003  NA     NA    NA      3       1

字符串

  • 数据类型:*
> dput(df_list)
list(structure(list(Date = c("01/01/1990", "01/01/1991", "01/01/1992", 
"01/01/1993", "01/01/1994", "01/01/1995"), USA = c(1, 4, 2, 1, 
NA, NA), FRANCE = c(4, 4, 2, 5, NA, NA), ITALY = c(1, 4, 5, 2, 
NA, NA)), class = "data.frame", row.names = c(NA, -6L)), structure(list(
    Date = c("01/01/1994", "01/01/1995", "01/01/1996", "01/01/1997", 
    "01/01/1998", "01/01/1999"), USA = c(3, 3, 1, 4, 3, 1), FRANCE = c(2, 
    5, 2, 5, 5, 1), MEXICO = c(4, 1, 4, 3, NA, NA)), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(Date = c("01/01/1998", "01/01/1999", "01/01/2000", 
"01/01/2001", "01/01/2002", "01/01/2003"), MEXICO = c(3, 3, 5, 
4, 2, 3), BELGIUM = c(4, 2, 1, 4, 5, 1)), class = "data.frame", row.names = c(NA, 
-6L)))

相关问题