R语言 计算分层内的倾向评分

aamkag61  于 2023-11-14  发布在  其他
关注(0)|答案(1)|浏览(143)

假设以下数据:

library(cobalt)
library(dplyr)

set.seed(123)

# Extending lalonde data by `group` variable:
lalonde <- cbind(lalonde,
                 group = sample(c("A", "B", "AB"), size = 614, replace = TRUE))

# Creating variable `set` for AB group:
lalonde$set[lalonde$group == "AB"] <- sample(c(5, 10, 15, 3), sum(lalonde$group == "AB"), replace = TRUE)

# Filling 'set' column for `group` 'A' and 'B':
prob_80 <- sample(c(0, 1), nrow(lalonde), replace = TRUE, prob = c(0.8, 0.2))
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1] <- sample(c(5, 10, 15, 3), sum((lalonde$group == "A" | lalonde$group == "B") & prob_80 == 1), replace = TRUE)
lalonde$set[(lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)] <- sample(1:100, sum((lalonde$group == "A" | lalonde$group == "B") & is.na(lalonde$set)), replace = TRUE)

字符串
现在我们有一个group变量,包含ABAB和一个名为set的变量。
现在我想拟合一个逻辑PS模型,该模型由group == "AB"set值分层,预测在AB组中。
首先,我将提取group == ABset的不同值。

unique_set_values <- unique(lalonde[lalonde$group == "AB", "set"])  %>% 
  print()


它们是:

+   print()
[1]  5 15 10  3


我使用它们来获取属于set值之一的所有观测值:

filtered_data <- lalonde %>% 
  filter(set %in% unique_set_values)


然后我将数据拆分,并将AB替换为1,否则为0

# For AB and A:
AB_A <- filtered_data %>% 
  filter(group %in% c("AB", "A")) %>%
  mutate(group = ifelse(group == "AB", 1, 0)) 

# For AB and B:
AB_B <- filtered_data %>% 
  filter(group %in% c("AB", "B")) %>%
  mutate(group = ifelse(group == "AB", 1, 0))


现在我可以计算ABA以及ABB的分层PS:

# Creating a formula:
formula <- group ~ age + educ + race + married + nodegree + re74 + re75 + re78


但是在这种情况下,如何计算set分层的PS?
我试过了:

AB_A_PS <- AB_A %>%
  group_by(set) %>%
  mutate(pscore = glm(formula, data = ., family = binomial(link = "logit"))$fitted.values)


但我得到的是一个错误:

Error in `mutate()`:
ℹ In argument: `pscore = predict(glm(formula, data = ., family = binomial(link = "logit")))`.
ℹ In group 1: `set = 3`.
Caused by error:
! `pscore` must be size 66 or 1, not 238.

所以,很明显,它不起作用。
谢谢你

bfhwhh0e

bfhwhh0e1#

您正在将整个分组数据框传递给glm,用于数据框中的每个组,因此出现错误。相反,您可以传递仅包含当前组中的行的数据框子集:

AB_A %>%
  group_by(set) %>%
  mutate(pscore = glm(formula, data = .[cur_group_rows(),], 
                    family = binomial(link = "logit"))$fitted.values)
#> # A tibble: 238 x 12
#> # Groups:   set [4]
#>    treat   age  educ race   married nodegree  re74  re75   re78 group   set pscore
#>    <int> <int> <int> <fct>    <int>    <int> <dbl> <dbl>  <dbl> <dbl> <dbl>  <dbl>
#>  1     1    37    11 black        1        1     0     0  9930.     1     5  0.706
#>  2     1    22     9 hispan       0        1     0     0  3596.     1    15  0.942
#>  3     1    30    12 black        0        0     0     0 24909.     1    15  0.993
#>  4     1    33     8 black        0        1     0     0   290.     1    15  0.977
#>  5     1    22    16 black        0        0     0     0  2164.     1    15  0.835
#>  6     1    17     7 black        0        1     0     0  3024.     1    15  0.908
#>  7     1    27    13 black        0        0     0     0 14582.     1    10  1    
#>  8     1    23    10 black        0        1     0     0  7693.     1    15  0.939
#>  9     1    26    12 black        0        0     0     0 10747.     0     3  0.809
#> 10     1    38     9 white        0        1     0     0  6409.     1    10  1    
#> # i 228 more rows
#> # i Use `print(n = ...)` to see more rows

字符串

相关问题