通过R中的重复日期绑定或合并行

6rvt4ljy  于 2023-03-05  发布在  其他
关注(0)|答案(2)|浏览(108)

我有一个像这样的6列数据框。您可以看到日期是重复的。我如何合并行并保留以下列中的信息?

date     1       2          3    4    5

2019-01-01  NA  1966439.    NA  NA    NA
2019-01-01  NA  NA          NA  133.6 NA
2019-01-01  NA  NA          NA  NA    6.2
2019-02-01  NA  1962946     NA  NA    NA
2019-02-01  NA  NA          NA  134.5 NA
2019-02-01  NA  NA          NA  NA    6.1
2019-03-01  NA  1974072     NA  NA    NA
2019-03-01  NA  NA          NA  135.4 NA
2019-03-01  NA  NA          NA  NA    6.3
2019-04-01  NA  1984086     NA  NA    NA

我想要这样的,没有重复的日期。

date        1     2      3    4   5 

2019-01-01  NA  1966439 NA  133.6 6.2
2019-02-01  NA  1962946 NA  134.5 6.1
2019-03-01  NA  1974072 NA  135.4 6.3
2019-04-01  NA  1984086 NA  NA    NA

非常感谢

bvjveswy

bvjveswy1#

我在下面粘贴了一个解决方案。希望我的评论能很好地解释这个解决方案。

#Packages used
library(dplyr)

#Some reproducible data
dta <- data.frame(
  date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
  x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
  y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
  z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)

dta <- dta |> 
  group_by(date) |> #To group by the dates
  dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
            x = sum(x, na.rm = TRUE),
            y = sum(y, na.rm = TRUE),
            z = sum(z, na.rm = TRUE)) |> 
  select_if(~sum(.) > 0) #Remove columns with sum of 0 (columns with all NA)
ws51t4hk

ws51t4hk2#

如果每列中每个日期只有一个非缺失值,则以下代码有效:

library(tidyverse)

df <- tibble::tribble(
     ~date, ~col1, ~col2, ~col3, ~col4, ~col5,
   "2019-01-01",  NA,  1966439,    NA,  NA,    NA,
   "2019-01-01",  NA,  NA,          NA,  133.6, NA,
  "2019-01-01",  NA,  NA,          NA,  NA,    6.2,
   "2019-02-01",  NA,  1962946,     NA,  NA,    NA,
   "2019-02-01",  NA,  NA,          NA,  134.5, NA,
  "2019-02-01",  NA,  NA,          NA,  NA,    6.1,
   "2019-03-01",  NA,  1974072,     NA,  NA,    NA,
   "2019-03-01",  NA,  NA,          NA,  135.4, NA,
  "2019-03-01",  NA,  NA,          NA,  NA,    6.3,
   "2019-04-01",  NA,  1984086,     NA,  NA,    NA
  )

remove_na <- function(x) {
  if (all(is.na(x))) return(NA)
  discard(x, is.na)
}

df |> 
  group_by(date) |> 
  summarize(across(starts_with("col"), remove_na))
#> # A tibble: 4 × 6
#>   date       col1     col2 col3   col4  col5
#>   <chr>      <lgl>   <dbl> <lgl> <dbl> <dbl>
#> 1 2019-01-01 NA    1966439 NA     134.   6.2
#> 2 2019-02-01 NA    1962946 NA     134.   6.1
#> 3 2019-03-01 NA    1974072 NA     135.   6.3
#> 4 2019-04-01 NA    1984086 NA      NA   NA

创建于2023年3月3日,使用reprex v2.0.2
请包括一些生成数据集的代码(就像我在这里做的),下次你张贴一个问题!

相关问题