如何使用tidyr::pivot_longger透视属于先前已知组的大量列

b09cbbtk  于 2023-04-18  发布在  其他
关注(0)|答案(1)|浏览(122)

我需要透视大量的列,这些列的名称不一定遵循某种模式。
为了进行说明,创建一个示例 Dataframe ,以及一个示例名称分组和一个示例ID变量(ID和ID2)的前两个组合的预期结果,我需要将它们保存在行中。
多谢了

# Example data frame
df <- data.frame(interview_id = c(1,1,1,2,2,3,3,3,4,4),
                 edad_l_id = c(1,2,3,1,2,1,2,3,1,2),                 
                 matrix(rnorm(200), ncol = 20))

# Example column names
colnames(df) <- c("ID","ID2","var1","ejem","exam","x1","sit","var2","ehem","pilot","y2","pot","var3","nom","ejec","z3","mar","var4","lom","yip","u4","tom")

# Example name list
namelist <- list(
  group0 = c("ID","ID2"),        
  group1 = c("var1","ejem","exam","x1","sit"),
  grupo2 = c("var2","ehem","pilot","y2","pot"),
  grupo3 = c("var3","nom","ejec","z3","mar"),
  grupo4 = c("var4","lom","yip","u4","tom"))
ID  ID2 group1  group1_val  group2  group2_val  group3  group3_val  group4  group4_val
1   1   var1    -1.37       var2    -0.23       var3     1.49       var4    -1.36
1   1   ejem    -0.72       ehem    0.82        nom      0.37       lom     0.15
1   1   exam    0.49        pilot   0.18        ejec     0.69       yip     -1.01
1   1   x1      -0.90       y2      -0.62       z3       0.48       u4      -0.93
1   1   sit     2.26        pot     -1.10       mar      0.11       tom     0.77
1   2   var1    0.78        var2    -0.20       var3     0.36       var4    -0.59
1   2   ejem    -0.76       ehem    -1.29       nom      0.94       lom     -0.59
1   2   exam    -1.82       pilot   -1.62       ejec     0.44       yip     -0.04
1   2   x1      0.92        y2      0.39        z3      -0.69       u4      0.65
1   2   sit     -1.05       pot     0.79        mar      1.01       tom    -0.48
1   3   etc.........

等等等等2 1等等

rqqzpn5f

rqqzpn5f1#

也许有一种方法可以在一个非常花哨的旋转规范中完成这一切,但这似乎可以完成这项工作:

df <- data.frame(interview_id = c(1,1,1,2,2,3,3,3,4,4),
                 edad_l_id = c(1,2,3,1,2,1,2,3,1,2),                 
                 matrix(rnorm(200), ncol = 20))

# Example column names
colnames(df) <- c("ID","ID2","var1","ejem","exam","x1","sit","var2","ehem","pilot","y2",
  "pot","var3","nom","ejec","z3","mar","var4","lom","yip","u4","tom")

# Example name list
namelist <- list(
  group0 = c("ID","ID2"),        
  group1 = c("var1","ejem","exam","x1","sit"),
  group2 = c("var2","ehem","pilot","y2","pot"),
  group3 = c("var3","nom","ejec","z3","mar"),
  group4 = c("var4","lom","yip","u4","tom"))

library(tidyr)
library(dplyr)
library(purrr)

imap(namelist[2:5], ~ df |> 
       select(namelist$group0, all_of(.x)) |> 
       pivot_longer(.x, names_to = .y, values_to = as.character(glue::glue("{.y}_val"))) |>
       group_by(ID, ID2) |> 
       mutate(ID3 = row_number()) |> 
       ungroup()) |> 
  reduce(left_join) |> 
  select(-ID3)

# A tibble: 50 × 10
      ID   ID2 group1 group1_val group2 group2_val group3 group3_val group4 group4_val
   <dbl> <dbl> <chr>       <dbl> <chr>       <dbl> <chr>       <dbl> <chr>       <dbl>
 1     1     1 var1       -0.977 var2        1.19  var3      -0.722  var4      -0.321 
 2     1     1 ejem        0.838 ehem       -0.363 nom        0.205  lom       -0.474 
 3     1     1 exam       -1.10  pilot      -0.622 ejec      -1.34   yip       -1.49  
 4     1     1 x1         -0.880 y2         -1.06  z3        -0.283  u4         1.28  
 5     1     1 sit         0.201 pot         1.02  mar       -0.651  tom       -0.505 
 6     1     2 var1       -0.997 var2       -1.58  var3       0.297  var4       0.786 
 7     1     2 ejem        0.982 ehem        0.420 nom       -1.31   lom        0.0716
 8     1     2 exam       -0.603 pilot       0.367 ejec       0.644  yip        0.0501
 9     1     2 x1          0.532 y2         -0.760 z3         0.0211 u4         0.231 
10     1     2 sit        -0.757 pot        -1.43  mar        0.226  tom       -2.07

这个想法是分别透视每组变量,然后将它们合并回来。由于行之间没有唯一的标识符,我临时添加了第三个标识符变量,以避免重复的数据膨胀。

相关问题