在R中跨第二个组的级别对一个组的成员资格重新采样

w6lpcovy  于 2023-11-14  发布在  其他
关注(0)|答案(1)|浏览(128)

我有像df这样的数据,它有一个数字变量(x)和两个分组变量(grpOnegrpTwo)。我想为grpOne的每个水平在grpTwo中 Shuffle 成员资格,并计算一些检验统计量(例如,x的均值或方差的差异)。

library(tidyverse)
df <- data.frame(x = rnorm(80),
                 grpOne = rep(letters[1:4],10),
                 grpTwo = rep(rep(LETTERS[25:26],each=4),10))
# I want to shuffle groupTwo membership while keeping grpOne intact
# e.g. for first level of grpOne
tmp <- df %>% filter(grpOne=="a")
tmp$grpTwo <- tmp$grpTwo[sample(1:nrow(tmp),nrow(tmp))]
# calculate some statistic, like difference in means
tmp %>% group_by(grpTwo) %>% 
  summarise(meanX = mean(x)) %>%
  summarise(diffMeanX = diff(meanX))
# repeat for grpOne == "b", then "c", etc.

字符串
什么是一个简洁的方法来做到这一点,而不循环所有级别的grpOne
编辑:这里有一个使用Palmer企鹅数据的更好的例子

penguins_df <-palmerpenguins::penguins %>%
  filter(!is.na(sex)) %>%
  select(species,body_mass_g,sex)
# I want to get the difference in mean body mass between male and females
# for each species
# e.g. for Adelie
tmp <- penguins_df %>% filter(species=="Adelie")
# shuffle sex
tmp$sex <- tmp$sex[sample(1:nrow(tmp),nrow(tmp))]
# calculate some statistic, like difference in means
tmp %>% group_by(sex) %>% 
  summarise(meanMass = mean(body_mass_g)) %>%
  summarise(diffMeanMass = diff(meanMass))
# repeat for all species

mefy6pfw

mefy6pfw1#

我使用了两次group_by,在这些调用之间随机分配了grpTwo

df %>% group_by(grpOne) %>%
          mutate( grpTwo = sample(grpTwo)) 
#------------
# A tibble: 80 × 3
# Groups:   grpOne [4]
         x grpOne grpTwo
     <dbl> <chr>  <chr> 
 1  0.451  a      Y     
 2 -2.09   b      Y     
 3  0.679  c      Y     
 4 -1.56   d      Y     
 5 -0.570  a      Z     
 6 -0.0895 b      Z     
 7 -0.159  c      Z     
 8  0.858  d      Z     
 9  0.460  a      Z     
10 -0.665  b      Y     
# ℹ 70 more rows
# ℹ Use `print(n = ...)` to see more rows

# different than original (look at item 9)
df
#--------------------
              x grpOne grpTwo
1   0.450875260      a      Y
2  -2.089325642      b      Y
3   0.678633129      c      Y
4  -1.558839921      d      Y
5  -0.570102287      a      Z
6  -0.089466139      b      Z
7  -0.159141726      c      Z
8   0.858083117      d      Z
9   0.460205261      a      Y
10 -0.665093858      b      Y
11 -0.227525514      c      Y
#----snipped----

字符串
然后只需通过grpTwo计算均值(或其他任何您想要的值)(请注意,它与原始值不同):

df %>% group_by(grpOne) %>%
          mutate( grpTwo = sample(grpTwo)) %>%
             group_by(grpTwo) %>%
               summarise( mn_x=mean(x))
# A tibble: 2 × 2
  grpTwo    mn_x
  <chr>    <dbl>
1 Y      -0.0791
2 Z       0.0197

df %>% group_by(grpTwo) %>% summarise(mnx =mean(x))
# A tibble: 2 × 2
  grpTwo      mnx
  <chr>     <dbl>
1 Y      -0.0647 
2 Z       0.00533

相关问题