我的目标是通过估计的边际均值来解释障碍模型的系数。我更喜欢解释概率(从logit量表反向转换),而不是对数优势(模型系数)或比值比(exp(对数优势))。我想使用emmeans()
来实现这个目标,因为它与许多模型兼容,并且我有在线性模型和二项式GLM中使用它的经验。
栅栏模型的栅栏部分产生与二项式GLM相同的系数,这在其他地方已经评论过(here或here)。
然而,我不完全理解为什么,根据设置,从跨栏估计的平均值不匹配的二项式GLM。特别是
lin.pred = FALSE
。这从二项式GLM产生不同的概率。在emmeans
documentation之后,我认为它们是在概率尺度上平均的。lin.pred = TRUE, type = "response"
。此设置产生与二项式GLM相同的概率。根据emmeans文档,我认为它们是在logit标度上平均的。
我有两个问题:
- 1.如果我们的目标是解释概率,而不是logits,那么哪种设置更可取?
- 1.
Inf
自由度的含义是什么?
下面是一个可重复的例子,使用生物化学学生在至少3年的博士学位期间发表的论文数量,重点是两个因素水平(单身与已婚)之间的成对比较。
我的首选方法是与二项式GLM的估计概率相匹配的方法。我的理解是,已婚博士生比单身博士生更有可能发表至少一篇论文。已婚学生有74%的机会至少完成一篇论文,而单身学生有67%的机会:这7%的变化是一个很小的差异,并不显著。
提前感谢!
library(emmeans)
library(pscl)
#> Warning: package 'pscl' was built under R version 4.2.3
#> Classes and Methods for R developed in the
#> Political Science Computational Laboratory
#> Department of Political Science
#> Stanford University
#> Simon Jackman
#> hurdle and zeroinfl functions by Achim Zeileis
data("bioChemists", package = "pscl")
# str(bioChemists)
hurdle <- hurdle(art ~ ., data = bioChemists, dist = "poisson", zero.dist = "binomial", link = "logit")
glm <- glm(art > 0 ~ ., data = bioChemists, family = binomial(link = "logit"))
# Same coefficients
coef(summary(hurdle))$zero
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.23679601 0.29551883 0.8012891 4.229643e-01
#> femWomen -0.25115113 0.15910522 -1.5785222 1.144457e-01
#> marMarried 0.32623358 0.18081823 1.8042074 7.119880e-02
#> kid5 -0.28524872 0.11113043 -2.5667921 1.026441e-02
#> phd 0.02221940 0.07955710 0.2792887 7.800233e-01
#> ment 0.08012135 0.01301763 6.1548321 7.515710e-10
coef(glm)
#> (Intercept) femWomen marMarried kid5 phd ment
#> 0.23679601 -0.25115113 0.32623358 -0.28524872 0.02221940 0.08012135
#######
## lin.pred = FALSE
emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = FALSE)
#> $emmeans
#> mar emmean SE df lower.CL upper.CL
#> Single 0.806 0.167 903 0.478 1.13
#> Married 0.858 0.122 903 0.619 1.10
#>
#> Results are averaged over the levels of: fem
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df t.ratio p.value
#> Single - Married -0.0522 0.213 903 -0.245 0.8066
#>
#> Results are averaged over the levels of: fem
#######
## lin.pred = TRUE, type = "response"
emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = TRUE, type = "response")
#> $emmeans
#> mar prob SE df lower.CL upper.CL
#> Single 0.677 0.0306 903 0.615 0.734
#> Married 0.744 0.0200 903 0.703 0.781
#>
#> Results are averaged over the levels of: fem
#> Confidence level used: 0.95
#> Intervals are back-transformed from the logit scale
#>
#> $contrasts
#> contrast odds.ratio SE df null t.ratio p.value
#> Single / Married 0.722 0.13 903 1 -1.804 0.0715
#>
#> Results are averaged over the levels of: fem
#> Tests are performed on the log odds ratio scale
emmeans::emmeans(glm, specs = pairwise ~ mar, regrid = "response")
#> $emmeans
#> mar response SE df asymp.LCL asymp.UCL
#> Single 0.677 0.0305 Inf 0.617 0.736
#> Married 0.743 0.0201 Inf 0.704 0.783
#>
#> Results are averaged over the levels of: fem
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Single - Married -0.0667 0.0377 Inf -1.772 0.0764
#>
#> Results are averaged over the levels of: fem
#######
# Estimates and contrasts at the logit scale appear to match
emmeans::emmeans(hurdle, specs = pairwise ~ mar, mode = "zero", lin.pred = TRUE)
#> $emmeans
#> mar emmean SE df lower.CL upper.CL
#> Single 0.741 0.140 903 0.466 1.02
#> Married 1.068 0.105 903 0.862 1.27
#>
#> Results are averaged over the levels of: fem
#> Results are given on the logit (not the response) scale.
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df t.ratio p.value
#> Single - Married -0.326 0.181 903 -1.804 0.0715
#>
#> Results are averaged over the levels of: fem
#> Results are given on the log odds ratio (not the response) scale.
emmeans::emmeans(glm, specs = pairwise ~ mar)
#> $emmeans
#> mar emmean SE df asymp.LCL asymp.UCL
#> Single 0.741 0.140 Inf 0.467 1.02
#> Married 1.068 0.105 Inf 0.862 1.27
#>
#> Results are averaged over the levels of: fem
#> Results are given on the logit (not the response) scale.
#> Confidence level used: 0.95
#>
#> $contrasts
#> contrast estimate SE df z.ratio p.value
#> Single - Married -0.326 0.181 Inf -1.804 0.0712
#>
#> Results are averaged over the levels of: fem
#> Results are given on the log odds ratio (not the response) scale.
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 22000)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
#> [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Spanish_Spain.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] pscl_1.5.5.1 emmeans_1.8.5-9000004
#>
#> loaded via a namespace (and not attached):
#> [1] compiler_4.2.0 highr_0.9 tools_4.2.0
#> [4] digest_0.6.29 evaluate_0.15 lifecycle_1.0.3
#> [7] lattice_0.20-45 rlang_1.1.1 reprex_2.0.2
#> [10] Matrix_1.4-1 cli_3.6.1 rstudioapi_0.13
#> [13] yaml_2.3.5 mvtnorm_1.1-3 xfun_0.40
#> [16] fastmap_1.1.0 coda_0.19-4 withr_2.5.0
#> [19] stringr_1.5.0 knitr_1.39 fs_1.5.2
#> [22] vctrs_0.6.3 grid_4.2.0 glue_1.6.2
#> [25] survival_3.3-1 rmarkdown_2.14 multcomp_1.4-19
#> [28] TH.data_1.1-1 magrittr_2.0.3 codetools_0.2-18
#> [31] htmltools_0.5.6 splines_4.2.0 MASS_7.3-57
#> [34] xtable_1.8-4 numDeriv_2016.8-1.1 sandwich_3.0-1
#> [37] estimability_1.4.1 stringi_1.7.6 zoo_1.8-10
创建于2023-09-06使用reprex v2.0.2
1条答案
按热度按时间wn9m85ua1#
Russell Lenth(emmeans包的开发者)在GitHub上提供了一个答案。我把它贴在这里,用emmeans和glmmTMB拟合的障碍模型之间的比较,显示了一致的结果。
创建于2023-09-14带有reprex v2.0.2