在两列上展开逗号分隔列,同时向R中的重复值添加NA

vlju58qv  于 2023-10-13  发布在  其他
关注(0)|答案(2)|浏览(106)

我有以下dataframe:

structure(list(A = c("1,2,3", "1,2", "1,4"), B = c("X,Y", "X", 
"X"), ID = c(1, 2, 3), D = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))

我尝试展开A和B,但创建了NA,这样就不会为列找到重复的值。所以列A和B应该“独立地”展开,就好像A和B的值不直接相关,它们只与ID和D相关

structure(list(A = c("1", "2", "3", "NA", "NA", "1", "2", "NA", 
"1", "4", "NA"), B = c("NA", "NA", "NA", "X", "Y", "NA", "NA", 
"X", "NA", "NA", "X"), ID = c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3
), D = c(1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3)), class = "data.frame", row.names = c(NA, 
-11L))

以前关于扩展多个列的问题中,我没有找到类似的问题!

slmsl1lt

slmsl1lt1#

这是一个非常不寻常的扩张。你可以单独处理每一列,然后将它们合并。例如

library(dplyr)
library(tidyr)
bind_rows(
  separate_longer_delim(select(dd, -B), A, ","), 
  separate_longer_delim(select(dd, -A), B, ",")
) |>
  select(A,B,ID, D) |>
  arrange(ID, D)
#       A    B ID D
# 1     1 <NA>  1 1
# 2     2 <NA>  1 1
# 3     3 <NA>  1 1
# 4  <NA>    X  1 1
# 5  <NA>    Y  1 1
# 6     1 <NA>  2 2
# 7     2 <NA>  2 2
# 8  <NA>    X  2 2
# 9     1 <NA>  3 3
# 10    4 <NA>  3 3
# 11 <NA>    X  3 3
mqkwyuun

mqkwyuun2#

在基R中,在一个Map中,你可以strsplitAB,创建data.frame s,其中有连续的 ids,你可以merge,剩下的是cbindrbind

with(dat, 
     Map(\(w, x, y, z) merge(data.frame(i=seq_along(w), w), 
                             data.frame(i=seq_along(x) + length(w), x), 
                             all=TRUE)[, -1] |> cbind(y, z) |> setNames(names(dat)), 
              strsplit(A, ','), strsplit(B, ','), ID, D)) |> do.call(what='rbind')
#       A    B ID D
# 1     1 <NA>  1 1
# 2     2 <NA>  1 1
# 3     3 <NA>  1 1
# 4  <NA>    X  1 1
# 5  <NA>    Y  1 1
# 6     1 <NA>  2 2
# 7     2 <NA>  2 2
# 8  <NA>    X  2 2
# 9     1 <NA>  3 3
# 10    4 <NA>  3 3
# 11 <NA>    X  3 3
  • 数据 *
dat <- structure(list(A = c("1,2,3", "1,2", "1,4"), B = c("X,Y", "X", 
"X"), ID = c(1, 2, 3), D = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))

相关问题