体育数据的汇总和分组,需要Dplyr杂技

xytpbqjk  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(83)

我有一些来自一家体育俱乐部的数据,希望总结一下每个赛季的球员数据。我遇到的问题是源数据库为一个球员在整个赛季中所打的每个年级生成数据。我需要总结每一季的数字,结合成绩。
举个例子:

ID          Full.Season Group   Age Grade   Season type     Games   Total.Personal.Fouls    Total.Technical.Fouls       Total.two.points.made   Total.three.points.made
28139ABB    2008 Winter Girls   12  A       Normal Season   15      30                      0                           0                       0                   
28139ABB    2009 Summer Girls   14  B       Normal Season   13      28                      0                           0                       0                   
28139ABB    2009 Winter Girls   14  B       Normal Season   6       10                      0                           0                       0                   
28139ABB    2009 Winter Girls   14  AR      Normal Season   11      9                       0                           0                       0                   
28139ABB    2010 Summer Girls   14  AR      Normal Season   13      16                      0                           0                       0                   
28139ABB    2010 Winter Girls   16  BR      Normal Season   8       3                       0                           0                       0                   
28139ABB    2011 Winter Girls   16  AR      Normal Season   15      31                      0                           0                       0                   
28139ABB    2011 Summer Girls   16  BR      Normal Season   9       17                      0                           0                       0                   
28139ABB    2011 Summer Girls   16  B       Normal Season   5       13                      0                           0                       0                   
28139ABB    2011 Summer Girls   16  A       Normal Season   1       1                       0                           0                       0

但我想总结一下,所以我想我需要产生这样的东西:

ID          Full.Season Group   Age Grade       Season type     Games   Total.Personal.Fouls    Total.Technical.Fouls       Total.two.points.made   Total.three.points.made 
28139ABB    2008 Winter Girls   12  A           Normal Season   15      30                      0                           0                       0                       
28139ABB    2009 Summer Girls   14  B           Normal Season   13      28                      0                           0                       0                       
28139ABB    2009 Winter Girls   14  B, AR       Normal Season   17      19                      0                           0                       0                       
28139ABB    2010 Summer Girls   14  AR          Normal Season   13      16                      0                           0                       0                       
28139ABB    2010 Winter Girls   16  BR, AR      Normal Season   23      34                      0                           0                       0                       
28139ABB    2011 Summer Girls   16  BR, B, A    Normal Season   15      31                      0                           0                       0

我不知道该怎么做。我想我需要group_by()“ID”,然后“Full.Season”,然后折叠“Grade”列(连接Grade列中的字符),并使用统计数据对行求和(这里是Total.Personal.Fouls,但其他ID在其他列中有双精度)。但我不知道从何说起。
救命啊!
dput输出:

structure(list(ID = c("28139ABB", "28139ABB", "28139ABB", "28139ABB", 
"28139ABB", "28139ABB", "28139ABB", "28139ABB", "28139ABB", "28139ABB"
), Full.Season = c("2008 Winter", "2009 Summer", "2009 Winter", 
"2009 Winter", "2010 Summer", "2010 Winter", "2011 Winter", "2011 Summer", 
"2011 Summer", "2011 Summer"), Group = c("Girls", "Girls", "Girls", 
"Girls", "Girls", "Girls", "Girls", "Girls", "Girls", "Girls"
), Age = c(12, 14, 14, 14, 14, 16, 16, 16, 16, 16), Grade = c("A", 
"B", "B", "AR", "AR", "BR", "AR", "BR", "B", "A"), `Season type` = c("Normal Season", 
"Normal Season", "Normal Season", "Normal Season", "Normal Season", 
"Normal Season", "Normal Season", "Normal Season", "Normal Season", 
"Normal Season"), Games = c(15, 13, 6, 11, 13, 8, 15, 9, 5, 1
), Total.Personal.Fouls = c(30, 28, 10, 9, 16, 3, 31, 17, 13, 
1), Total.Technical.Fouls = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    Total.two.points.made = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    Total.three.points.made = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))
ifmq2ha2

ifmq2ha21#

library(tidyverse)

df1 |> 
  group_by(ID, Full.Season, Group, Age, `Season type`) |> 
  summarise(Grade = paste(Grade, collapse = ", "),
            across(Games:Total.three.points.made, sum))
#> # A tibble: 7 × 11
#> # Groups:   ID, Full.Season, Group, Age [7]
#>   ID      Full.Season Group   Age `Season type` Grade Games Total.Personal.Fouls
#>   <chr>   <chr>       <chr> <dbl> <chr>         <chr> <dbl>                <dbl>
#> 1 28139A… 2008 Winter Girls    12 Normal Season A        15                   30
#> 2 28139A… 2009 Summer Girls    14 Normal Season B        13                   28
#> 3 28139A… 2009 Winter Girls    14 Normal Season B, AR    17                   19
#> 4 28139A… 2010 Summer Girls    14 Normal Season AR       13                   16
#> 5 28139A… 2010 Winter Girls    16 Normal Season BR        8                    3
#> 6 28139A… 2011 Summer Girls    16 Normal Season BR, …    15                   31
#> 7 28139A… 2011 Winter Girls    16 Normal Season AR       15                   31
#> # ℹ 3 more variables: Total.Technical.Fouls <dbl>, Total.two.points.made <dbl>,
#> #   Total.three.points.made <dbl>

相关问题