R语言 用估计的边际平均值解释栅栏模型

iyzzxitl  于 2023-09-27  发布在  其他
关注(0)|答案(1)|浏览(117)

我的目标是通过估计的边际均值来解释障碍模型的系数。我更喜欢解释概率(从logit量表反向转换),而不是对数优势(模型系数)或比值比(exp(对数优势))。我想使用emmeans()来实现这个目标,因为它与许多模型兼容,并且我有在线性模型和二项式GLM中使用它的经验。

栅栏模型的栅栏部分产生与二项式GLM相同的系数,这在其他地方已经评论过(herehere)。
然而,我不完全理解为什么,根据设置,从跨栏估计的平均值不匹配的二项式GLM。特别是

  • lin.pred = FALSE。这从二项式GLM产生不同的概率。在emmeansdocumentation之后,我认为它们是在概率尺度上平均的。
  • lin.pred = TRUE, type = "response"。此设置产生与二项式GLM相同的概率。根据emmeans文档,我认为它们是在logit标度上平均的。

我有两个问题:

  1. 1.如果我们的目标是解释概率,而不是logits,那么哪种设置更可取?
  2. 1.Inf自由度的含义是什么?
    下面是一个可重复的例子,使用生物化学学生在至少3年的博士学位期间发表的论文数量,重点是两个因素水平(单身与已婚)之间的成对比较。
    我的首选方法是与二项式GLM的估计概率相匹配的方法。我的理解是,已婚博士生比单身博士生更有可能发表至少一篇论文。已婚学生有74%的机会至少完成一篇论文,而单身学生有67%的机会:这7%的变化是一个很小的差异,并不显著。
    提前感谢!
  1. library(emmeans)
  2. library(pscl)
  3. #> Warning: package 'pscl' was built under R version 4.2.3
  4. #> Classes and Methods for R developed in the
  5. #> Political Science Computational Laboratory
  6. #> Department of Political Science
  7. #> Stanford University
  8. #> Simon Jackman
  9. #> hurdle and zeroinfl functions by Achim Zeileis
  10. data("bioChemists", package = "pscl")
  11. # str(bioChemists)
  12. hurdle <- hurdle(art ~ ., data = bioChemists, dist = "poisson", zero.dist = "binomial", link = "logit")
  13. glm <- glm(art > 0 ~ ., data = bioChemists, family = binomial(link = "logit"))
  14. # Same coefficients
  15. coef(summary(hurdle))$zero
  16. #> Estimate Std. Error z value Pr(>|z|)
  17. #> (Intercept) 0.23679601 0.29551883 0.8012891 4.229643e-01
  18. #> femWomen -0.25115113 0.15910522 -1.5785222 1.144457e-01
  19. #> marMarried 0.32623358 0.18081823 1.8042074 7.119880e-02
  20. #> kid5 -0.28524872 0.11113043 -2.5667921 1.026441e-02
  21. #> phd 0.02221940 0.07955710 0.2792887 7.800233e-01
  22. #> ment 0.08012135 0.01301763 6.1548321 7.515710e-10
  23. coef(glm)
  24. #> (Intercept) femWomen marMarried kid5 phd ment
  25. #> 0.23679601 -0.25115113 0.32623358 -0.28524872 0.02221940 0.08012135
  26. #######
  27. ## lin.pred = FALSE
  28. emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = FALSE)
  29. #> $emmeans
  30. #> mar emmean SE df lower.CL upper.CL
  31. #> Single 0.806 0.167 903 0.478 1.13
  32. #> Married 0.858 0.122 903 0.619 1.10
  33. #>
  34. #> Results are averaged over the levels of: fem
  35. #> Confidence level used: 0.95
  36. #>
  37. #> $contrasts
  38. #> contrast estimate SE df t.ratio p.value
  39. #> Single - Married -0.0522 0.213 903 -0.245 0.8066
  40. #>
  41. #> Results are averaged over the levels of: fem
  42. #######
  43. ## lin.pred = TRUE, type = "response"
  44. emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = TRUE, type = "response")
  45. #> $emmeans
  46. #> mar prob SE df lower.CL upper.CL
  47. #> Single 0.677 0.0306 903 0.615 0.734
  48. #> Married 0.744 0.0200 903 0.703 0.781
  49. #>
  50. #> Results are averaged over the levels of: fem
  51. #> Confidence level used: 0.95
  52. #> Intervals are back-transformed from the logit scale
  53. #>
  54. #> $contrasts
  55. #> contrast odds.ratio SE df null t.ratio p.value
  56. #> Single / Married 0.722 0.13 903 1 -1.804 0.0715
  57. #>
  58. #> Results are averaged over the levels of: fem
  59. #> Tests are performed on the log odds ratio scale
  60. emmeans::emmeans(glm, specs = pairwise ~ mar, regrid = "response")
  61. #> $emmeans
  62. #> mar response SE df asymp.LCL asymp.UCL
  63. #> Single 0.677 0.0305 Inf 0.617 0.736
  64. #> Married 0.743 0.0201 Inf 0.704 0.783
  65. #>
  66. #> Results are averaged over the levels of: fem
  67. #> Confidence level used: 0.95
  68. #>
  69. #> $contrasts
  70. #> contrast estimate SE df z.ratio p.value
  71. #> Single - Married -0.0667 0.0377 Inf -1.772 0.0764
  72. #>
  73. #> Results are averaged over the levels of: fem
  74. #######
  75. # Estimates and contrasts at the logit scale appear to match
  76. emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = TRUE)
  77. #> $emmeans
  78. #> mar emmean SE df lower.CL upper.CL
  79. #> Single 0.741 0.140 903 0.466 1.02
  80. #> Married 1.068 0.105 903 0.862 1.27
  81. #>
  82. #> Results are averaged over the levels of: fem
  83. #> Results are given on the logit (not the response) scale.
  84. #> Confidence level used: 0.95
  85. #>
  86. #> $contrasts
  87. #> contrast estimate SE df t.ratio p.value
  88. #> Single - Married -0.326 0.181 903 -1.804 0.0715
  89. #>
  90. #> Results are averaged over the levels of: fem
  91. #> Results are given on the log odds ratio (not the response) scale.
  92. emmeans::emmeans(glm, specs = pairwise ~ mar)
  93. #> $emmeans
  94. #> mar emmean SE df asymp.LCL asymp.UCL
  95. #> Single 0.741 0.140 Inf 0.467 1.02
  96. #> Married 1.068 0.105 Inf 0.862 1.27
  97. #>
  98. #> Results are averaged over the levels of: fem
  99. #> Results are given on the logit (not the response) scale.
  100. #> Confidence level used: 0.95
  101. #>
  102. #> $contrasts
  103. #> contrast estimate SE df z.ratio p.value
  104. #> Single - Married -0.326 0.181 Inf -1.804 0.0712
  105. #>
  106. #> Results are averaged over the levels of: fem
  107. #> Results are given on the log odds ratio (not the response) scale.
  108. sessionInfo()
  109. #> R version 4.2.0 (2022-04-22 ucrt)
  110. #> Platform: x86_64-w64-mingw32/x64 (64-bit)
  111. #> Running under: Windows 10 x64 (build 22000)
  112. #>
  113. #> Matrix products: default
  114. #>
  115. #> locale:
  116. #> [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
  117. #> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
  118. #> [5] LC_TIME=Spanish_Spain.utf8
  119. #>
  120. #> attached base packages:
  121. #> [1] stats graphics grDevices utils datasets methods base
  122. #>
  123. #> other attached packages:
  124. #> [1] pscl_1.5.5.1 emmeans_1.8.5-9000004
  125. #>
  126. #> loaded via a namespace (and not attached):
  127. #> [1] compiler_4.2.0 highr_0.9 tools_4.2.0
  128. #> [4] digest_0.6.29 evaluate_0.15 lifecycle_1.0.3
  129. #> [7] lattice_0.20-45 rlang_1.1.1 reprex_2.0.2
  130. #> [10] Matrix_1.4-1 cli_3.6.1 rstudioapi_0.13
  131. #> [13] yaml_2.3.5 mvtnorm_1.1-3 xfun_0.40
  132. #> [16] fastmap_1.1.0 coda_0.19-4 withr_2.5.0
  133. #> [19] stringr_1.5.0 knitr_1.39 fs_1.5.2
  134. #> [22] vctrs_0.6.3 grid_4.2.0 glue_1.6.2
  135. #> [25] survival_3.3-1 rmarkdown_2.14 multcomp_1.4-19
  136. #> [28] TH.data_1.1-1 magrittr_2.0.3 codetools_0.2-18
  137. #> [31] htmltools_0.5.6 splines_4.2.0 MASS_7.3-57
  138. #> [34] xtable_1.8-4 numDeriv_2016.8-1.1 sandwich_3.0-1
  139. #> [37] estimability_1.4.1 stringi_1.7.6 zoo_1.8-10

创建于2023-09-06使用reprex v2.0.2

wn9m85ua

wn9m85ua1#

Russell Lenth(emmeans包的开发者)在GitHub上提供了一个答案。我把它贴在这里,用emmeans和glmmTMB拟合的障碍模型之间的比较,显示了一致的结果。

  1. library(emmeans)
  2. #> Warning: package 'emmeans' was built under R version 4.2.3
  3. library(glmmTMB)
  4. #> Warning: package 'glmmTMB' was built under R version 4.2.3
  5. #> Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
  6. #> TMB was built with Matrix version 1.6.1
  7. #> Current Matrix version is 1.4.1
  8. #> Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
  9. library(pscl)
  10. #> Warning: package 'pscl' was built under R version 4.2.3
  11. #> Classes and Methods for R developed in the
  12. #> Political Science Computational Laboratory
  13. #> Department of Political Science
  14. #> Stanford University
  15. #> Simon Jackman
  16. #> hurdle and zeroinfl functions by Achim Zeileis
  17. # Create datasets: number of papers during a Phd
  18. data("bioChemists", package = "pscl")
  19. # Declare models
  20. pscl.hurdle <- pscl::hurdle(art ~ fem + mar, data = bioChemists, dist = "poisson", zero.dist = "binomial", link = "logit")
  21. glmmtmb.hurdle <- glmmTMB::glmmTMB(art ~ fem + mar, data = bioChemists, family = truncated_poisson(link = "log"), zi = ~ fem + mar)
  22. # Truncated count
  23. emmeans(pscl.hurdle, ~ mar, mode = "count", lin.pred = TRUE, type = "response")
  24. #> mar count SE df lower.CL upper.CL
  25. #> Single 2.07 0.1122 909 1.86 2.31
  26. #> Married 2.09 0.0807 909 1.94 2.26
  27. #>
  28. #> Results are averaged over the levels of: fem
  29. #> Confidence level used: 0.95
  30. #> Intervals are back-transformed from the log scale
  31. emmeans(glmmtmb.hurdle, ~ mar, comp = "cond", type = "response")
  32. #> mar rate SE df asymp.LCL asymp.UCL
  33. #> Single 2.07 0.1122 Inf 1.86 2.30
  34. #> Married 2.09 0.0807 Inf 1.94 2.26
  35. #>
  36. #> Results are averaged over the levels of: fem
  37. #> Confidence level used: 0.95
  38. #> Intervals are back-transformed from the log scale
  39. # Binomial
  40. emmeans(pscl.hurdle, ~ mar, mode = "prob0")
  41. #> mar emmean SE df lower.CL upper.CL
  42. #> Single 0.313 0.0265 909 0.261 0.365
  43. #> Married 0.297 0.0191 909 0.259 0.334
  44. #>
  45. #> Results are averaged over the levels of: fem
  46. #> Confidence level used: 0.95
  47. emmeans(glmmtmb.hurdle, ~ mar, comp = "zi", type = "response")
  48. #> mar response SE df asymp.LCL asymp.UCL
  49. #> Single 0.313 0.0267 Inf 0.263 0.367
  50. #> Married 0.296 0.0190 Inf 0.261 0.335
  51. #>
  52. #> Results are averaged over the levels of: fem
  53. #> Confidence level used: 0.95
  54. #> Intervals are back-transformed from the logit scale
  55. # Response (binomial * truncated count)
  56. emmeans(pscl.hurdle, ~ mar, mode = "response")
  57. #> mar emmean SE df lower.CL upper.CL
  58. #> Single 1.64 0.0900 909 1.47 1.82
  59. #> Married 1.69 0.0633 909 1.57 1.82
  60. #>
  61. #> Results are averaged over the levels of: fem
  62. #> Confidence level used: 0.95
  63. emmeans(glmmtmb.hurdle, ~ mar, comp = "response")
  64. #> mar emmean SE df asymp.LCL asymp.UCL
  65. #> Single 1.64 0.0900 Inf 1.47 1.82
  66. #> Married 1.69 0.0633 Inf 1.57 1.82
  67. #>
  68. #> Results are averaged over the levels of: fem
  69. #> Confidence level used: 0.95
  70. sessionInfo()
  71. #> R version 4.2.0 (2022-04-22 ucrt)
  72. #> Platform: x86_64-w64-mingw32/x64 (64-bit)
  73. #> Running under: Windows 10 x64 (build 22000)
  74. #>
  75. #> Matrix products: default
  76. #>
  77. #> locale:
  78. #> [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
  79. #> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
  80. #> [5] LC_TIME=Spanish_Spain.utf8
  81. #>
  82. #> attached base packages:
  83. #> [1] stats graphics grDevices utils datasets methods base
  84. #>
  85. #> other attached packages:
  86. #> [1] pscl_1.5.5.1 glmmTMB_1.1.7 emmeans_1.8.8
  87. #>
  88. #> loaded via a namespace (and not attached):
  89. #> [1] Rcpp_1.0.10 nloptr_2.0.1 compiler_4.2.0
  90. #> [4] highr_0.9 TMB_1.9.6 tools_4.2.0
  91. #> [7] boot_1.3-28 lme4_1.1-29 digest_0.6.29
  92. #> [10] nlme_3.1-157 evaluate_0.15 lifecycle_1.0.3
  93. #> [13] lattice_0.20-45 rlang_1.1.1 reprex_2.0.2
  94. #> [16] Matrix_1.4-1 cli_3.6.1 rstudioapi_0.13
  95. #> [19] yaml_2.3.5 mvtnorm_1.1-3 xfun_0.40
  96. #> [22] fastmap_1.1.0 coda_0.19-4 withr_2.5.0
  97. #> [25] stringr_1.5.0 knitr_1.39 fs_1.5.2
  98. #> [28] vctrs_0.6.3 grid_4.2.0 glue_1.6.2
  99. #> [31] survival_3.3-1 rmarkdown_2.14 multcomp_1.4-19
  100. #> [34] minqa_1.2.4 TH.data_1.1-1 magrittr_2.0.3
  101. #> [37] codetools_0.2-18 htmltools_0.5.6 splines_4.2.0
  102. #> [40] MASS_7.3-57 xtable_1.8-4 numDeriv_2016.8-1.1
  103. #> [43] sandwich_3.0-1 estimability_1.4.1 stringi_1.7.6
  104. #> [46] zoo_1.8-10

创建于2023-09-14带有reprex v2.0.2

展开查看全部

相关问题