R语言 使用分组列将数据转换为长格式

ohtdti5x  于 2023-02-17  发布在  其他
关注(0)|答案(1)|浏览(114)

在本周的tidytuesday挑战中,由于某种原因,我无法对R中的列名进行分组,这是我之前使用tidyr中的pivot_longer函数所做的。因此,这是我的代码,我不明白为什么它会抛出一个错误,而不是给予我想要的。

library(tidyverse)

tuesdata <- tidytuesdayR::tt_load(2023, week = 7)
age_gaps <- tuesdata$age_gaps

df_long <- age_gaps %>%
  pivot_longer(cols= actor_1_name:actor_2_name, names_to = "actornumber", values_to = "actorname") %>%
  pivot_longer(cols= character_1_gender:character_2_gender, names_to = "gendernumber", values_to = "gender") %>%
  pivot_longer(cols= actor_1_age:actor_2_age, names_to = "agenumber", values_to = "age") %>%
  select(movie_name, release_year, director, age_difference, actorname, gender, age)

从代码中可以看出,初始数据有1155行,在快速处理数据后,我希望得到1155x2=2310行的数据,因为我想合并演员names上的列及其相关信息,如agebirthdate。代码没有给予我预期的结果,我想知道为什么和如何才能解决这个问题。2谢谢你的关注。

示例数据(前6行)
age_gaps <- structure(list(movie_name = c("Harold and Maude", "Venus", "The Quiet American", 
"The Big Lebowski", "Beginners", "Poison Ivy"), release_year = c(1971, 
2006, 2002, 1998, 2010, 1992), director = c("Hal Ashby", "Roger Michell", 
"Phillip Noyce", "Joel Coen", "Mike Mills", "Katt Shea"), age_difference = c(52, 
50, 49, 45, 43, 42), couple_number = c(1, 1, 1, 1, 1, 1), actor_1_name = c("Ruth Gordon", 
"Peter O'Toole", "Michael Caine", "David Huddleston", "Christopher Plummer", 
"Tom Skerritt"), actor_2_name = c("Bud Cort", "Jodie Whittaker", 
"Do Thi Hai Yen", "Tara Reid", "Goran Visnjic", "Drew Barrymore"
), character_1_gender = c("woman", "man", "man", "man", "man", 
"man"), character_2_gender = c("man", "woman", "woman", "woman", 
"man", "woman"), actor_1_birthdate = structure(c(-26725, -13666, 
-13442, -14351, -14629, -13278), class = "Date"), actor_2_birthdate = structure(c(-7948, 
4536, 4656, 2137, 982, 1878), class = "Date"), actor_1_age = c(75, 
74, 69, 68, 81, 59), actor_2_age = c(23, 24, 20, 23, 38, 17)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
rkttyhzu

rkttyhzu1#

您可以在names_to中设置".value",并提供names_sepnames_pattern之一来指定应如何拆分列名。

library(tidyr)

age_gaps %>%
  pivot_longer(actor_1_name:actor_2_age,
               names_prefix = "(actor|character)_", 
               names_to = c("actor", ".value"),
               names_sep = '_')

# A tibble: 12 × 10
   movie_name         release_year director      age_difference couple_number actor name                gender birthdate    age
   <chr>                     <dbl> <chr>                  <dbl>         <dbl> <chr> <chr>               <chr>  <date>     <dbl>
 1 Harold and Maude           1971 Hal Ashby                 52             1 1     Ruth Gordon         woman  1896-10-30    75
 2 Harold and Maude           1971 Hal Ashby                 52             1 2     Bud Cort            man    1948-03-29    23
 3 Venus                      2006 Roger Michell             50             1 1     Peter O'Toole       man    1932-08-02    74
 4 Venus                      2006 Roger Michell             50             1 2     Jodie Whittaker     woman  1982-06-03    24
 5 The Quiet American         2002 Phillip Noyce             49             1 1     Michael Caine       man    1933-03-14    69
 6 The Quiet American         2002 Phillip Noyce             49             1 2     Do Thi Hai Yen      woman  1982-10-01    20
 7 The Big Lebowski           1998 Joel Coen                 45             1 1     David Huddleston    man    1930-09-17    68
 8 The Big Lebowski           1998 Joel Coen                 45             1 2     Tara Reid           woman  1975-11-08    23
 9 Beginners                  2010 Mike Mills                43             1 1     Christopher Plummer man    1929-12-13    81
10 Beginners                  2010 Mike Mills                43             1 2     Goran Visnjic       man    1972-09-09    38
11 Poison Ivy                 1992 Katt Shea                 42             1 1     Tom Skerritt        man    1933-08-25    59
12 Poison Ivy                 1992 Katt Shea                 42             1 2     Drew Barrymore      woman  1975-02-22    17

相关问题