我正在尝试生成多个频率表,这些频率表由多个独立变量分层,我可以让它对一个变量和一个分层变量工作,但是我的for循环坏了。
library(tidyverse)
# Create example dataframe of survey data
df <- data.frame(
var1 = sample(1:7, 1000, replace = TRUE),
var2 = sample(1:7, 1000, replace = TRUE),
var3 = sample(1:7, 1000, replace = TRUE),
var4 = sample(1:7, 1000, replace = TRUE),
var5 = sample(1:7, 1000, replace = TRUE),
var6 = sample(1:7, 1000, replace = TRUE),
strat1 = sample(c("A", "B", "C"), 1000, replace = TRUE),
strat2 = sample(c("X", "Y"), 1000, replace = TRUE),
strat3 = sample(c("True", "False"), 1000, replace = TRUE)
)
这个例子适用于一个变量和一个分层变量。我想把这段代码转换成一个for循环:
temp_df <- df %>% count(var1)
temp_df$percent <- temp_df$n / sum(temp_df$n) * 10
strat_df <- temp_df %>%
left_join((df %>% group_by(var1, strat1) %>% count(var1) %>% pivot_wider(names_from = strat1, values_from = n)), by = "var1")
for(k in c("A","B","C")){
strat_df[paste0(k, "_pct")] <- (strat_df[[k]] / temp_df$n) * 100
}
我想要同样的输出,但是添加了其他两个分层变量的count和_pct列。
我试过使用下面的for循环,但是它只给每个变量一行,并且只为每个strat变量生成两列,而我所寻找的输出将为分层变量中的每个类别提供一个原始计数和列百分比列。由于有3个strat变量,其中两个有两个类别,一个有三个类别。我期望的输出将具有13列,包括“v#"、“n”和“percent”列。
# Create a list of the variables of interest
variables <- c("var1", "var2", "var3", "var4", "var5", "var6")
# Create a list of the stratification variables
strats <- c("strat1", "strat2", "strat3")
# Create a loop that runs through each variable
for(i in variables){
# Create a frequency table for the current variable
temp_df <- df %>% count(!! i)
# Add a column for the percent of responses within each response category
temp_df$percent <- temp_df$n / sum(temp_df$n) * 100
# Add a column for the raw count for each category of the stratification variables
for(j in strats){
temp_df <- temp_df %>% group_by(!!i) %>% mutate( !!j := n() )
}
# Add a column for the percent of the stratification variable category within the response category
for(j in strats){
temp_df[paste0(j, "_pct")] <- (temp_df[[j]] / temp_df$n) * 100
}
assign(paste0(i,"_df"), temp_df)
}
这是我希望我的输出看起来像:
2条答案
按热度按时间4nkexdtk1#
更新:
想出了一个输出我所需要的解决方案:
n7taea2i2#
转换为
sym
bol并计算(!!
),或者使用across
,因为循环的变量是字符串