R语言 取消嵌套包含某些值列表的数据框

wqnecbli  于 2023-07-31  发布在  其他
关注(0)|答案(3)|浏览(107)

我正在尝试卸载数据框架,其中一些列包含值列表。理想情况下,这些值将以这样的方式被取消嵌套,即每个值创建一个新行,如果值比列表中的值少,则值将被NA填充。
示例数据:

dat <- data.frame(matrix(ncol = 4, nrow = 2))
colnames(dat)[1:4] <- c("Date","value1","value2","value3")
dat$Date <- c(as.Date('1968-06-13'), as.Date('1968-09-17'))
dat$value1 <- c(list(c(79,78)),list(c(55,56,57)))
dat$value2 <- c(list(c(7.3,7.2)),list(c(6.6,6.7)))
dat$value3 <- c(0.27,0.55)
View(dat)

字符串
我试过:

library(tidyverse)

dat %>%
  unnest(cols = c(value1,value2,value3))

# Error in `unnest()`: ! In row 2, can't recycle input of size 3 to size 2.


ChatGPT建议:

library(tidyr)

# First suggestion
dat %>%
  unnest_longer(cols = starts_with("value"), indices_to = "row") %>%
  pivot_wider(names_from = "row", values_from = starts_with("value"))

# Second suggestion
dat %>%
  unnest_wider(cols = starts_with("value"), names_sep = "_") %>%
  mutate(across(starts_with("value"), ~ ifelse(is.na(.), NA, as.numeric(.))))

# both produce same error - Error in unnest_wider(., cols = starts_with("value"), names_sep = "_") : unused argument (cols = starts_with("value"))


理想输出:

Date value1 value2 value3
1 1968-06-13     79    7.3   0.27
2 1968-06-13     78    7.2     NA
3 1968-09-17     55    6.6   0.55
4 1968-09-17     56    6.7     NA
5 1968-09-17     57     NA     NA

xxhby3vn

xxhby3vn1#

也许不是最简洁的解决方案,但有效:

library(dplyr)
library(tidyr)

dat |>
  pivot_longer(starts_with('value'), values_transform = as.list) |>
  unnest_longer(value) |>
  group_by(Date, name) |>
  mutate(i = row_number()) |>
  pivot_wider() |>
  select(-i)
+ # A tibble: 5 x 4
# Groups:   Date [2]
  Date       value1 value2 value3
  <date>      <dbl>  <dbl>  <dbl>
1 1968-06-13     79    7.3   0.27
2 1968-06-13     78    7.2  NA   
3 1968-09-17     55    6.6   0.55
4 1968-09-17     56    6.7  NA   
5 1968-09-17     57   NA    NA
8wtpewkr

8wtpewkr2#

也许你可以试试这个

dat %>%
    mutate(value3 = as.list(value3)) %>%
    unnest(value1) %>%
    mutate(across(value2:value3, ~ `length<-`(.x[[1]], n())), .by = "Date")

字符串
这给了

# A tibble: 5 × 4
  Date       value1 value2 value3
  <date>      <dbl>  <dbl>  <dbl>
1 1968-06-13     79    7.3   0.27
2 1968-06-13     78    7.2  NA
3 1968-09-17     55    6.6   0.55
4 1968-09-17     56    6.7  NA
5 1968-09-17     57   NA    NA

vktxenjb

vktxenjb3#

这里有一个比@ThomasIsCoding的解决方案更详细但可能更不密集的选项:

dat %>%
  mutate(value3=as.list(value3)) %>%
  pivot_longer(starts_with("value")) %>%
  unnest_longer(value) %>%
  group_by(Date, name) %>%
  mutate(rn=row_number()) %>%
  ungroup() %>%
  complete(Date, name, rn) %>%
  pivot_wider() %>%
  select(-rn) %>%
  drop_na(value1)

字符串
这样就能得到同样的

相关问题