我有像df
这样的数据,它有一个数字变量(x
)和两个分组变量(grpOne
和grpTwo
)。我想为grpOne
的每个水平在grpTwo
中 Shuffle 成员资格,并计算一些检验统计量(例如,x
的均值或方差的差异)。
library(tidyverse)
df <- data.frame(x = rnorm(80),
grpOne = rep(letters[1:4],10),
grpTwo = rep(rep(LETTERS[25:26],each=4),10))
# I want to shuffle groupTwo membership while keeping grpOne intact
# e.g. for first level of grpOne
tmp <- df %>% filter(grpOne=="a")
tmp$grpTwo <- tmp$grpTwo[sample(1:nrow(tmp),nrow(tmp))]
# calculate some statistic, like difference in means
tmp %>% group_by(grpTwo) %>%
summarise(meanX = mean(x)) %>%
summarise(diffMeanX = diff(meanX))
# repeat for grpOne == "b", then "c", etc.
字符串
什么是一个简洁的方法来做到这一点,而不循环所有级别的grpOne
?
编辑:这里有一个使用Palmer企鹅数据的更好的例子
penguins_df <-palmerpenguins::penguins %>%
filter(!is.na(sex)) %>%
select(species,body_mass_g,sex)
# I want to get the difference in mean body mass between male and females
# for each species
# e.g. for Adelie
tmp <- penguins_df %>% filter(species=="Adelie")
# shuffle sex
tmp$sex <- tmp$sex[sample(1:nrow(tmp),nrow(tmp))]
# calculate some statistic, like difference in means
tmp %>% group_by(sex) %>%
summarise(meanMass = mean(body_mass_g)) %>%
summarise(diffMeanMass = diff(meanMass))
# repeat for all species
型
1条答案
按热度按时间mefy6pfw1#
我使用了两次
group_by
,在这些调用之间随机分配了grpTwo
:字符串
然后只需通过
grpTwo
计算均值(或其他任何您想要的值)(请注意,它与原始值不同):型