假设以下数据:
library(cobalt)
library(dplyr)
set.seed(123)
# Extending lalonde data by `group` variable:
lalonde <- cbind(lalonde,
group = sample(c("A", "B", "AB"), size = 614, replace = TRUE))
# Creating variable `set` for AB group:
lalonde$set[lalonde$group == "AB"] <- sample(c(5, 10, 15, 3), sum(lalonde$group == "AB"), replace = TRUE)
# Filling 'set' column for `group` 'A' and 'B':
prob_80 <- sample(c(0, 1), nrow(lalonde), replace = TRUE, prob = c(0.8, 0.2))
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1] <- sample(c(5, 10, 15, 3), sum((lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1), replace = TRUE)
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)] <- sample(1:100, sum((lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)), replace = TRUE)
字符串
现在我们有一个group
变量,包含A
、B
、AB
和一个名为set
的变量。
现在我想拟合一个逻辑PS模型,该模型由group == "AB"
的set
值分层,预测在AB
组中。
首先,我将提取group == AB
中set
的不同值。
unique_set_values <- unique(lalonde[lalonde$group == "AB", "set"]) %>%
print()
型
它们是:
+ print()
[1] 5 15 10 3
型
我使用它们来获取属于set
值之一的所有观测值:
filtered_data <- lalonde %>%
filter(set %in% unique_set_values)
型
然后我将数据拆分,并将AB
替换为1
,否则为0
:
# For AB and A:
AB_A <- filtered_data %>%
filter(group %in% c("AB", "A")) %>%
mutate(group = ifelse(group == "AB", 1, 0))
# For AB and B:
AB_B <- filtered_data %>%
filter(group %in% c("AB", "B")) %>%
mutate(group = ifelse(group == "AB", 1, 0))
型
现在我可以计算AB
和A
以及AB
和B
的分层PS:
# Creating a formula:
formula <- group ~ age + educ + race + married + nodegree + re74 + re75 + re78
型
但是在这种情况下,如何计算set
分层的PS?
我试过了:
AB_A_PS <- AB_A %>%
group_by(set) %>%
mutate(pscore = glm(formula, data = ., family = binomial(link = "logit"))$fitted.values)
型
但我得到的是一个错误:
Error in `mutate()`:
ℹ In argument: `pscore = predict(glm(formula, data = ., family = binomial(link = "logit")))`.
ℹ In group 1: `set = 3`.
Caused by error:
! `pscore` must be size 66 or 1, not 238.
所以,很明显,它不起作用。
谢谢你
1条答案
按热度按时间bfhwhh0e1#
您正在将整个分组数据框传递给
glm
,用于数据框中的每个组,因此出现错误。相反,您可以传递仅包含当前组中的行的数据框子集:字符串