R语言 根据一组其他列中已识别的重复项将行折叠为多列的列表

lx0bsm1f  于 2023-03-27  发布在  其他
关注(0)|答案(2)|浏览(119)

我有一个大型数据集,它与下面的虚拟数据集非常相似:

df = data.frame(coursecode = c("WBPH001","WBPH001","WBPH001","WBPH058","WBAS007"),
                 coursename = c("Mechanics","Mechanics","Mechanics", "Calculus 2","Introduction"),
                 courseurl = c("url1","url1","url1","url2","url3"),
                 programme_faculty = c("FSE","FSE","FSE", "FSE", "FSE"),
                 programme_name = c( "Mat","Bio","Ast","Ast","Ast"),
                 programme_ects = c("180", "180", "210", "180", "180")
                 )

这使得(所有值都是字符串):

#> print(df):
  coursecode   coursename      courseurl    programme_faculty   programme_name   programme_ects
1    WBPH001    Mechanics      url1         FSE                 Mat              180
2    WBPH001    Mechanics      url1         FSE                 Bio              180
3    WBPH001    Mechanics      url1         FSE                 Ast              210
4    WBPH058    Calculus 2     url2         FSE                 Ast              180
5    WBAS007    Introduction   url3         FSE                 Ast              180

我已经导出了整个学院的所有课程,但有些课程在多个程序中列出(在此示例中,例如“力学”与“Mat”,“Bio”和“Ast”程序相关联。
简而言之,我想实现的是,所有这些重复的课程被删除,同时保留程序信息(即名称,ECT,教师)。

因此,如果在“coursecode”、“coursename”和“courseurl”列中识别出重复,它将自动将课程信息(“programme_faculty”、“programme name”和“programme_ects”列)折叠到每列的单独列表中

数据集应该看起来像这样:

#> print(modified_df):
     coursecode coursename     courseurl    programme_faculty        programme_name      programme_ects
1    WBPH001    Mechanics      url1        c(FSE, FSE, FSE)          c(Mat, Bio, Ast)    c(180, 180, 210)
2    WBPH058    Calculus 2     url2        FSE                       Ast                 180
3    WBAS007    Introduction   url3        FSE                       Ast                 180

课程信息主要用于下游分析,但重要的是,总是可以检索课程相关的课程。因此,我需要这样的 Dataframe ,但我似乎找不出哪些函数必须用于实现这一点。
重要的是,字符串不是简单地折叠在一起,用类似“|“。
我尝试过aggregate()、collapse()等函数,以及其他stackoverflow查询中的建议,但它们的解决方案对我的特定数据集不起作用。

fdx2calv

fdx2calv1#

你可以在列上group_by,在这些组上summarise,在要合并的列上across,方法是像这样折叠paste

library(dplyr)
df %>%
  group_by(coursecode, coursename, courseurl) %>%
  summarise(across(programme_faculty:programme_ects, ~ paste(.x, collapse = ", ")))
#> # A tibble: 3 × 6
#> # Groups:   coursecode, coursename [3]
#>   coursecode coursename   courseurl programme_faculty programme_name programme…¹
#>   <chr>      <chr>        <chr>     <chr>             <chr>          <chr>      
#> 1 WBAS007    Introduction url3      FSE               Ast            180        
#> 2 WBPH001    Mechanics    url1      FSE, FSE, FSE     Mat, Bio, Ast  180, 180, …
#> 3 WBPH058    Calculus 2   url2      FSE               Ast            180        
#> # … with abbreviated variable name ¹​programme_ects

你也可以像这样list它们:

library(dplyr)
df %>%
  group_by(coursecode, coursename, courseurl) %>%
  summarise(across(programme_faculty:programme_ects, ~ list(.x)))
#> # A tibble: 3 × 6
#> # Groups:   coursecode, coursename [3]
#>   coursecode coursename   courseurl programme_faculty programme_name programme…¹
#>   <chr>      <chr>        <chr>     <list>            <list>         <list>     
#> 1 WBAS007    Introduction url3      <chr [1]>         <chr [1]>      <chr [1]>  
#> 2 WBPH001    Mechanics    url1      <chr [3]>         <chr [3]>      <chr [3]>  
#> 3 WBPH058    Calculus 2   url2      <chr [1]>         <chr [1]>      <chr [1]>  
#> # … with abbreviated variable name ¹​programme_ects

创建于2023-03-25带有reprex v2.0.2
正如@ zephyl所说,你可以用list替换~list(.x)

c9qzyr3d

c9qzyr3d2#

我们可以使用toString()

library(dplyr)
df %>%
  group_by(across(starts_with("course"))) %>% 
  summarise(across(starts_with("programme"), ~toString(.))) %>% 
  arrange(courseurl)
coursecode coursename   courseurl programme_faculty programme_name programme_ects
  <chr>      <chr>        <chr>     <chr>             <chr>          <chr>         
1 WBPH001    Mechanics    url1      FSE, FSE, FSE     Mat, Bio, Ast  180, 180, 210 
2 WBPH058    Calculus 2   url2      FSE               Ast            180           
3 WBAS007    Introduction url3      FSE               Ast            180

相关问题