R语言 宽到长格式:如何在特定列之间求平均值(2组列)

ny6fqffe  于 2023-05-20  发布在  其他
关注(0)|答案(3)|浏览(121)

我有一个宽格式的数据集,我想转换为长格式。宽格式中的列表示一个长格式变量的两个级别。我想对宽格式的每一组列进行逐行平均,并在一个汇总的长格式表中表示每一组列的值(平均值)。

虚拟数据

假设我有一项研究的数据,该研究测试了兽医如何为动物称重。有10名兽医(即“受试者”),每排一名。至于被称重的动物,有狗或猫(其中6只)。每名兽医在诊所环境或室内环境中测量所有动物。我想为每名兽医计算(1)他/她测量的所有犬的平均体重,以及(2)他/她测量的所有猫的平均体重。

library(tidyverse)

set.seed(123)
df <- data.frame(replicate(12,sample(10:25,10,rep=TRUE)))
id <- 1:10
clinic_vs_house <- sample(c(0,1), 10, replace = TRUE)
df <- cbind(id, clinic_vs_house, df)
names(df) <- c("id",
               "location",
               "cat_a",
               "cat_b",
               "cat_c",
               "cat_d",
               "cat_e",
               "cat_f",
               "dog_a",
               "dog_b",
               "dog_c",
               "dog_d",
               "dog_e",
               "dog_f")

> df
##    id location cat_a cat_b cat_c cat_d cat_e cat_f dog_a dog_b dog_c dog_d dog_e dog_f
## 1   1        0    24    13    18    18    24    21    14    15    23    15    18    25
## 2   2        1    24    23    21    23    25    14    17    14    15    11    13    23
## 3   3        0    12    15    18    12    19    16    21    24    20    10    15    21
## 4   4        0    23    18    18    13    22    14    24    18    17    11    18    24
## 5   5        0    12    19    22    23    16    20    22    24    15    13    25    18
## 6   6        1    19    20    12    10    18    25    11    19    15    22    23    23
## 7   7        1    11    25    17    20    18    15    10    21    16    14    18    23
## 8   8        0    15    14    19    16    19    18    18    24    24    15    16    25
## 9   9        0    20    12    16    14    16    11    20    25    25    24    12    12
## 10 10        0    14    20    19    21    20    22    18    13    10    12    17    23

我目前的解决方案比较笨拙

我创建了两个单独的数据框,每个数据框对应一组不同的动物,在其中我计算相关列的平均值。然后我合并了表格。

cat_means <- 
  df %>%
  mutate(mean_wt = rowMeans(select(., matches("cat")))) %>%
  select(id, location, mean_wt) %>%
  mutate(animal = "cat")

dog_means <- 
  df %>%
  mutate(mean_wt = rowMeans(select(., matches("dog")))) %>%
  select(id, location, mean_wt) %>%
  mutate(animal = "dog")

means_table <-
  bind_rows(cat_means, dog_means) %>%
  select(id, location, animal, mean_wt) %>%
  arrange(id)

> means_table
##    id location animal  mean_wt
## 1   1        0    cat 16.85714
## 2   1        0    dog 18.33333
## 3   2        1    cat 18.71429
## 4   2        1    dog 15.50000
## 5   3        0    cat 13.14286
## 6   3        0    dog 18.50000
## 7   4        0    cat 15.42857
## 8   4        0    dog 18.66667
## 9   5        0    cat 16.00000
## 10  5        0    dog 19.50000
## 11  6        1    cat 15.00000
## 12  6        1    dog 18.83333
## 13  7        1    cat 15.28571
## 14  7        1    dog 17.00000
## 15  8        0    cat 14.42857
## 16  8        0    dog 20.33333
## 17  9        0    cat 12.71429
## 18  9        0    dog 19.66667
## 19 10        0    cat 16.57143
## 20 10        0    dog 15.50000

所以,是的,我能够得到我想要的结果,但我对这个解决方案不满意,因为它是重复的代码,如果我有超过2组的列要转换,那将是一个痛苦。有没有更简洁的方法来获得相同的结果?我知道我总是可以写一个函数运行x次,但我想知道是否有一个更优雅的解决方案,我不知道,特别是使用tidyverse

jyztefdp

jyztefdp1#

df %>%
  pivot_longer(cols=cat_a:dog_f, names_pattern = "(.*)_(.*)", names_to=c("animal","letter")) %>%
  group_by(id, location, animal) %>%
  summarise(mean_wt=mean(value))
# A tibble: 20 x 4
# Groups:   id, location [10]
      id location animal mean_wt
   <int>    <dbl> <chr>    <dbl>
 1     1        0 cat       19.7
 2     1        0 dog       18.3
 3     2        1 cat       21.7
 4     2        1 dog       15.5
 5     3        0 cat       15.3
 6     3        0 dog       18.5
 7     4        0 cat       18  
 8     4        0 dog       18.7
 9     5        0 cat       18.7
10     5        0 dog       19.5
11     6        1 cat       17.3
12     6        1 dog       18.8
13     7        1 cat       17.7
14     7        1 dog       17  
15     8        0 cat       16.8
16     8        0 dog       20.3
17     9        0 cat       14.8
18     9        0 dog       19.7
19    10        0 cat       19.3
20    10        0 dog       15.5
y0u0uwnf

y0u0uwnf2#

library(tidyverse)    
df %>% 
  # reshape into longer format
  pivot_longer(-c(id, location), names_to = "col", values_to = "wt") %>%
  # separate header into "animal" and "name" columns
  separate(col, c("animal", "name")) %>%
  # take the average weight for each id/location/animal combo
  group_by(id, location, animal) %>%
  summarise(mean_wt = mean(wt))

# A tibble: 20 x 4
# Groups:   id, location [10]
      id location animal mean_wt
   <int>    <dbl> <chr>    <dbl>
 1     1        0 cat       19.7
 2     1        0 dog       18.3
 3     2        1 cat       21.7
 4     2        1 dog       15.5
 5     3        0 cat       15.3
 6     3        0 dog       18.5
 7     4        0 cat       18  
 8     4        0 dog       18.7
 9     5        0 cat       18.7
10     5        0 dog       19.5
11     6        1 cat       17.3
12     6        1 dog       18.8
13     7        1 cat       17.7
14     7        1 dog       17  
15     8        0 cat       16.8
16     8        0 dog       20.3
17     9        0 cat       14.8
18     9        0 dog       19.7
19    10        0 cat       19.3
20    10        0 dog       15.5
lnlaulya

lnlaulya3#

为完整起见,data.table解决方案

library( data.table )
ans <- melt( setDT(df), id.vars = c("id", "location"), variable.name = "animal" )[, animal := gsub( "_.*$", "", animal ) ][]
ans[, .(temp = mean(value)), by = .(id, location, animal)]

#    id location animal     temp
# 1:  1        0    cat 19.66667
# 2:  2        1    cat 21.66667
# 3:  3        0    cat 15.33333
# 4:  4        0    cat 18.00000
# 5:  5        0    cat 18.66667
# 6:  6        1    cat 17.33333
# 7:  7        1    cat 17.66667
# 8:  8        0    cat 16.83333
# 9:  9        0    cat 14.83333
# 10: 10        0    cat 19.33333
# 11:  1        0    dog 18.33333
# 12:  2        1    dog 15.50000
# 13:  3        0    dog 18.50000
# 14:  4        0    dog 18.66667
# 15:  5        0    dog 19.50000
# 16:  6        1    dog 18.83333
# 17:  7        1    dog 17.00000
# 18:  8        0    dog 20.33333
# 19:  9        0    dog 19.66667
# 20: 10        0    dog 15.50000

相关问题