在R中使用边际效应包绘制动态处理效应

s5a0g9ez  于 2023-10-13  发布在  其他
关注(0)|答案(1)|浏览(162)

我试图使用R中的marginaleffects包来绘制治疗对时间结果的影响(即动态治疗效果)。我的数据包括两组(治疗组和对照组),七个时间点和一个连续的结果。

library(tibble)

d <- tibble(id = rep(1:10, each = 7),
            periods = factor(rep(0:6, times = 10)),
            treated = factor(ifelse(id<=5, 1, 0)),
            y = ifelse(treated==1, rnorm(100, 2, 0.5), rnorm(100, 0, 1)))

head(d, n=10)
# A tibble: 10 × 4
      id periods treated     y
   <int> <fct>   <fct>   <dbl>
 1     1 0       1        2.31
 2     1 1       1        2.32
 3     1 2       1        2.16
 4     1 3       1        1.74
 5     1 4       1        1.93
 6     1 5       1        2.17
 7     1 6       1        2.25
 8     2 0       1        2.02
 9     2 1       1        1.90
10     2 2       1        2.50

为了拟合模型,我使用了标准的双因素固定效应DiD方法,其中有时间(在我的情况下为periods)和组(在我的情况下为treated)的固定效应,治疗效应由两者的相互作用(treated * periods)表示。
在模型输出(打印在下面)中,我感兴趣的是绘制treated*:periods*项的效应--这代表了在考虑了组和时间固定效应后,治疗对时间的影响。

# fit model
m <- lm(y ~ treated*periods, data = d)

# print model output
tidy(m, conf.int = TRUE)

# A tibble: 14 × 7
   term              estimate std.error statistic   p.value conf.low conf.high
   <chr>                <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
 1 (Intercept)         0.269      0.308    0.872  0.387       -0.349     0.887
 2 treated1            2.11       0.436    4.84   0.0000107    1.24      2.98 
 3 periods1            0.475      0.436    1.09   0.281       -0.399     1.35 
 4 periods2           -0.0516     0.436   -0.118  0.906       -0.925     0.822
 5 periods3           -0.343      0.436   -0.787  0.435       -1.22      0.531
 6 periods4            0.0433     0.436    0.0994 0.921       -0.831     0.917
 7 periods5            0.578      0.436    1.33   0.190       -0.296     1.45 
 8 periods6           -0.220      0.436   -0.504  0.616       -1.09      0.654
 9 treated1:periods1  -1.07       0.617   -1.73   0.0897      -2.30      0.171
10 treated1:periods2   0.171      0.617    0.277  0.783       -1.07      1.41 
11 treated1:periods3   0.0223     0.617    0.0361 0.971       -1.21      1.26 
12 treated1:periods4  -0.635      0.617   -1.03   0.308       -1.87      0.601
13 treated1:periods5  -0.966      0.617   -1.57   0.123       -2.20      0.270
14 treated1:periods6  -0.170      0.617   -0.275  0.784       -1.41      1.07

然而,当我尝试使用marginaleffects绘制治疗效应时,绘制的值与模型输出(treated*:periods*项)的治疗效应不匹配。为了便于参考,我在下面包含了调用avg_comparisons()而不是plot_comparisons()的输出:

library(marginaleffects)

avg_comparisons(m, variables = "treated", by = "periods")

    Term          Contrast periods Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
 treated mean(1) - mean(0)       0     2.11      0.436 4.84  < 0.001 19.5  1.26   2.97
 treated mean(1) - mean(0)       1     1.05      0.436 2.40  0.01657  5.9  0.19   1.90
 treated mean(1) - mean(0)       2     2.28      0.436 5.23  < 0.001 22.5  1.43   3.14
 treated mean(1) - mean(0)       3     2.13      0.436 4.89  < 0.001 19.9  1.28   2.99
 treated mean(1) - mean(0)       4     1.48      0.436 3.38  < 0.001 10.4  0.62   2.33
 treated mean(1) - mean(0)       5     1.14      0.436 2.62  0.00868  6.8  0.29   2.00
 treated mean(1) - mean(0)       6     1.94      0.436 4.45  < 0.001 16.8  1.09   2.80

在我看来,avg_comparisons()在估计组间随时间的比较时没有考虑固定效应。如何获得与模型输出(treated*:periods*)中包含的治疗效应相匹配的avg_comparisons()plot_comparisons()
我尝试了这个调用avg_comparisons(m, variables = c("periods", "treated"), by = "treated")的许多变体,包括将调用中包含的术语切换到variables,以及将调用中包含的变量切换到by。但他们都没有产生我所寻找的输出。
我还尝试了avg_comparisons(m, variables = "treated:periods")来获得交互作用效果,但这产生了错误:Error: There is no valid predictor variable. Please change the变量argument or supply a new data frame to the 'newdata' argument.

yebdmbv4

yebdmbv41#

我认为你误解了你的模型系数,avg_comparisons()正确地报告了感兴趣的量。
考虑一下这段代码,我为可复制性设置了一个种子:

library(tibble)
library(marginaleffects)
set.seed(1024)

d <- tibble(
  id = rep(1:10, each = 7),
  periods = rep(0:6, times = 10),
  treated = ifelse(id <= 5, 1, 0),
  y = ifelse(treated == 1, rnorm(100, 2, 0.5), rnorm(100, 0, 1)))

m <- lm(y ~ factor(treated) * factor(periods), data = d)

估计不同时期治疗组和对照组之间预测结局的平均差异:

e <- avg_comparisons(m, variables = "treated", by = "periods")

首先,请注意,treated变量在周期2中从0变为1的“效应”* 不 * 等于交互作用项。它等于这两个系数之和:

coef(m)["factor(treated)1"] + coef(m)["factor(treated)1:factor(periods)2"]
# factor(treated)1 
#         2.153658

这可以使用来自基R的predict()函数来验证。看看当treated从0到1时,在不同的时间段内预测的结果会发生什么:0和2。

d0 <- data.frame(treated = 0, periods = 0)
d1 <- data.frame(treated = 1, periods = 0)
p0 <- predict(m, newdata = d0)
p1 <- predict(m, newdata = d1)
p1 - p0
#        1 
# 2.792739

d0 <- data.frame(treated = 0, periods = 2)
d1 <- data.frame(treated = 1, periods = 2)
p0 <- predict(m, newdata = d0)
p1 <- predict(m, newdata = d1)
p1 - p0
#        1 
# 2.153658

最后,看看这些正是avg_comparisons()报告的结果:

subset(e, periods %in% c(0, 2))
# 
#     Term          Contrast Estimate Std. Error    z Pr(>|z|)    S CI low
#  treated mean(1) - mean(0)     2.79      0.578 4.83   <0.001 19.5   1.66
#  treated mean(1) - mean(0)     2.15      0.578 3.72   <0.001 12.3   1.02
#  CI high
#     3.93
#     3.29
# 
# Columns: term, contrast, periods, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted

相关问题