R语言 通过多次回归提取选定变量的系数、标准误和p值

k4ymrczo  于 11个月前  发布在  其他
关注(0)|答案(1)|浏览(90)

我有以下假设数据。

library(data.table)

city <- c(1,1,1,1,2,2,2,2,2,3,3,3,3,3,3,3)                                       
village <- c(1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,7)                              
village_status <- c(1,0,1,0,1,1,1,0,0,1,1,1,1,0,0,0)
y <- c(1,1,0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1)
x1 <- c(2,5,4,4,3,4,5,3,2,1,3,5,1,5,4,5)
x2  <- c(21,33,46,8,19,30,20,2,34,38,19,89,35,67,60,37)
x3 <-  c(23,28,30,15,7,18,29,27,14,22,24,10,12,6,17,10)
datei <- data.table(city, village, village_status, y, x1, x2, x3)

字符串
我想做的是:
1.我想随机化村庄状态1000次,根据城市 Shuffle 村庄
1.我想做1000次回归,每次回归都使用随机化的村庄状态。
1.我想提取随机村庄状况的系数、标准误和p值,并将其存储为数据。table
对于第一点和第二点,我已经做了以下工作:
1.我打乱了村庄状态,并将随机村庄状态放入新列(randomvil1,randomvil2,...,randomvil1000)

n <- 1000
datei[, paste0("randomvil", 1:n) := replicate(n, sample(village_status), simplify=F), city]


1.我用lapply做了1000次回归

library(estimatr)

# Extract the randomized village columns
varvil0 <- c(paste0("randomvil", 1:n)) 
varvil <- datei[,..varvil0]

fit <- lapply(varvil, function(randomvil) lm_robust(y~ randomvil + randomvil:x1 + x2 + x3 + factor(city), data=datei, se_type= "stata", clusters= city))


现在,我想提取randomvil和randomvil:x1的系数、标准误差和p值,并将它们存储为data.table。

|               | Estimate     | Std. Error |    Pr(>|t|)  |
| ------------- | ------------ | --------   | --------     |
| randomvil1    | -0.945474623 | 0.474268   |   0.1843847  |
| randomvil1:x1 | -0.004905517 | 0.012388   |   0.7303777  |
| randomvil2    | -0.005449813 | 0.022198   |   0.828959   |
| randomvil2:x1 | -0.341242368 | 0.167598   |   0.1786814  |


有人能帮帮我吗?谢谢。

rn0zuynd

rn0zuynd1#

要提取相关系数,

result <- do.call(what = "rbind",
                  args = lapply(
                    X = fit, 
                    FUN = \(x) coef(summary(x)) |>  
                      { \(x) x[c("randomvil", "randomvil:x1"), 
                               c("Estimate", "Std. Error",  "Pr(>|t|)")] }()
                    )) |>
  # currently a matrix 
  `rownames<-`(c(rbind(paste0("randomvil", 1L:1e3), 
                       paste0("randomvil", 1L:1e3, ":x1")))) |>
  as.data.frame() |>
  data.table::setDT(keep.rownames = TRUE)

字符串
这给

> head(result)
              rn      Estimate Std. Error   Pr(>|t|)
1:    randomvil1 -0.1535707146 0.47420395 0.77678174
2: randomvil1:x1  0.0002317025 0.16725807 0.99902045
3:    randomvil2  0.8757044386 0.25611745 0.07592514
4: randomvil2:x1 -0.1845974571 0.07522312 0.13357686
5:    randomvil3  0.0857456330 0.32624900 0.81728482
6: randomvil3:x1 -0.0672697523 0.10371047 0.58310677


您可能更喜欢keep.rownames = "id"
请注意,

c(rbind(paste0("randomvil", 1L:1e3), 
        paste0("randomvil", 1L:1e3, ":x1")))


是一种重命名系数的方法,简单,但可能容易出错。

相关问题