R语言 如何添加行,使每个组具有相等的行数?

bq8i3lrv  于 2023-02-06  发布在  其他
关注(0)|答案(2)|浏览(238)

我有一个每组行数不等的数据框,请参见下例中的df。我想在所有其他列中添加包含组名和NA的行,以便每组行数相等,就像df.desired中一样。这些行应添加到相应组的最后一行之后。
示例:

df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
> df
  group col1 col2
1     A    1   12
2     A    1   13
3     A    1   14
4     A    1   15
5     B    2   21
6     B    2   22
7     B    2   23
8     C    3   31
9     C    3   32
df.desired = data.frame(group = c("A","A","A","A","B","B","B","B","C","C","C","C"),  
                         col1 = c(1, 1, 1, 1, 2, 2, 2, NA, 3, 3, NA, NA),
                         col2 = c(12, 13, 14, 15, 21, 22, 23, NA, 31, 32, NA, NA))
> df.desired
   group col1 col2
1      A    1   12
2      A    1   13
3      A    1   14
4      A    1   15
5      B    2   21
6      B    2   22
7      B    2   23
8      B   NA   NA
9      C    3   31
10     C    3   32
11     C   NA   NA
12     C   NA   NA

我知道如何用循环来做这个,但是那会非常慢,如果可能的话,我更喜欢用dplyr。有人有什么想法吗?

vdgimpew

vdgimpew1#

不如这样:

library(dplyr)
df = data.frame(group = c("A","A","A","A","B","B","B","C","C"),  
               col1 = c(1, 1, 1, 1, 2, 2, 2, 3, 3),
               col2 = c(12, 13, 14, 15, 21, 22, 23, 31, 32))
maxgp <- max(table(df$group))

df %>% 
  group_by(group) %>% 
  summarise(across(everything(), ~c(.x, rep(NA, maxgp-n()))))
#> `summarise()` has grouped output by 'group'. You can override using the
#> `.groups` argument.
#> # A tibble: 12 × 3
#> # Groups:   group [3]
#>    group  col1  col2
#>    <chr> <dbl> <dbl>
#>  1 A         1    12
#>  2 A         1    13
#>  3 A         1    14
#>  4 A         1    15
#>  5 B         2    21
#>  6 B         2    22
#>  7 B         2    23
#>  8 B        NA    NA
#>  9 C         3    31
#> 10 C         3    32
#> 11 C        NA    NA
#> 12 C        NA    NA

reprex package(v2.0.1)于2023年2月1日创建

vcudknz3

vcudknz32#

您可以为每个组创建行号,然后tidyr::complete

library(dplyr)

df %>%
  group_by(group) %>%
  mutate(id = row_number()) %>%
  ungroup() %>% 
  tidyr::complete(group, id) %>%
  select(-id)

# # A tibble: 12 × 3
#    group  col1  col2
#    <chr> <dbl> <dbl>
#  1 A         1    12
#  2 A         1    13
#  3 A         1    14
#  4 A         1    15
#  5 B         2    21
#  6 B         2    22
#  7 B         2    23
#  8 B        NA    NA
#  9 C         3    31
# 10 C         3    32
# 11 C        NA    NA
# 12 C        NA    NA

更新(来自@Maël的回答)

dplyr 1.1.0之后,mutate()summarise()filter()slice()系列支持Per-operation grouping with .by/by

df %>%
  mutate(id = row_number(), .by = group) %>%
  tidyr::complete(group, id) %>%
  select(-id)

相关问题