R语言 按组聚合列表列

fquxozlt  于 2023-06-19  发布在  其他
关注(0)|答案(2)|浏览(109)

考虑以下示例数据

library(tidyverse)
df <- tibble(group = c("a", "b", "b"), val = list(1:3, 4:6, 7:12))
## A tibble: 3 × 2
#  group val      
#  <chr> <list>   
#1 a     <int [3]>
#2 b     <int [3]>
#3 b     <int [6]>

我想合并group组合列val中的条目,以给出预期的输出

df_out <- tibble(group = c("a", "b"), val = list(1:3, 4:12))

我正在寻找一个tidyverse解决方案,但一直不成功。比如说

df %>% group_by(group) %>% summarise(val = map(val, c), .groups = "drop")

不会连接两个"b"行中的val项,而是生成警告

# A tibble: 3 × 2
  group val      
  <chr> <list>   
1 a     <int [3]>
2 b     <int [3]>
3 b     <int [6]>
Warning message:
Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped data frame and adjust
  accordingly.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

我理解这个警告,但我不明白 * 为什么 *“每个summarise()组超过1行”被返回。有人能解释一下并提供解决方案吗?
我希望有一个简单的两步group_by + summarise解决方案(即避免nest ing等)。
澄清一下:val中的数字不是按顺序排列的,也不是一个序列。一组不同的val数可能是c(1, 10, 2)c(4, 7, 7)。预期的组合输出将是c(1, 10, 2, 4, 7, 7)。所以:

df <- tibble(group = c("a", "b", "b"), val = list(1:3, c(1, 10, 2), c(4, 7, 7)))
df_out <- tibble(group = c("a", "b"), val = list(1:3, c(1, 10, 2, 4, 7, 7)))
chy5wohz

chy5wohz1#

我们可以unlist分组数据中的值变量,然后list返回

library(dplyr)

df <- df |> 
    group_by(group) |> 
    summarise(val = list(unlist(val))) |> 
    ungroup()

pull(df, val)

[[1]]
[1] 1 2 3

[[2]]
[1]  4  5  6  7  8  9 10 11 12
dffbzjpn

dffbzjpn2#

不如这样:

df %>% group_by(group) %>%  
    summarise(val = list(unlist(val)))

相关问题