我拟合了两个GLMM模型(完整模型见下文),一个采用泊松分布(卵数),另一个采用二项分布(存活后代数/卵数),然后我检索了完整平均系数(总结见下文)。
我想为一些固定项(来自平均模型)绘制回归线和置信区间,但我真的不确定我是否做得正确。
下面是我所做的,我注意到#1和#2有点不同。鉴于它们来自同一个模型,有人能给我一些提示吗?
我还想知道,我的模型是标准化的这一事实是否会影响我绘制回归线的方式?
这是我的df的样子:
> dput(head(dataRS.dist, 10))
structure(list(MaleID = c("875626", "871031",
"871031", "877334", "871895", "877906", "870404", "861285", "861285", "873932"
), FemaleID = c("861255", "861255", "861255", "861255", "861503",
"861503", "861525", "861525", "861525", "861621"), Year = c(1996L, 1997L,
1998L, 1999L, 1997L, 2000L, 1996L, 1997L, 1998L, 1998L), AgeM = c(1L,
5L, 6L, 2L, 2L, 1L, 3L, 3L, 4L, 3L), AgeF = c(2L, 3L, 4L, 5L, 3L, 6L, 3L,
4L, 5L, 3L), LayingDate = c(165L, 122L, 99L, 102L, 116L, 112L, 106L, 162L,
106L, 98L), LD_QUA = c(27225L, 14884L, 9801L, 10404L, 13456L, 12544L,
11236L, 26244L, 11236L, 9604L), Breed_Suc = c(1, 0.833, 1, 1, 0.667, 0.875,
0.6, 0.8, 0.6, 0.667), NbE = c(7L, 6L, 4L, 3L, 6L, 8L, 5L, 5L, 5L, 6L),
M_distance = c(5.011, 14.146, 14.146, 12.011, 8.36, 11.236, 11.135, 11.236,
11.236, 6.989), M_distance_QUA = c(25.112, 200.111, 200.111, 144.27, 69.882,
126.247, 123.984, 126.247, 126.247, 48.843), F_distance = c(1, 1, 1, 1,
6.438, 6.438, 10.371, 10.371, 10.371, 14.079), F_distance_QUA = c(1, 1, 1, 1,
41.45, 41.45, 107.553, 107.553, 107.553, 198.208), scale.LD = structure(c(1.49006909403421,
0.231226641474467, -0.44210769361563, -0.354281475995183, 0.0555742062335718,
-0.061527417260358, -0.237179852501253, 1.40224287641377, -0.237179852501253,
-0.471383099489113), .Dim = c(10L, 1L)), scale.LD_Quad = structure(c(1.50346602030487,
0.0806268589946397, -0.505410857311973, -0.43588877528701, -0.084012499432636,
-0.189160324982829, -0.339964443206132, 1.39036293163739, -0.339964443206132,
-0.528123709980161), .Dim = c(10L, 1L))), row.names = c(9L, 10L, 11L, 12L, 19L,
21L, 22L, 23L, 24L, 28L), class = "data.frame")
泊松分布模型:
full.model.dist.nb = glmer(NbE ~ M_distance + M_distance_QUA + F_distance +
F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD +
scale.LD_Quad + (1|Year) + (1|MaleID) + (1|FemaleID), data = dataRS.dist,
na.action = na.fail, nAGQ = 0, family = "poisson"(link="log"))
慕敏::疏通()......
summary(nb.dist.avg.model)
Call:
model.avg(object = best.nb.dist.models)
Component model call:
glmer(formula = NbE ~ <5 unique rhs>, data = dataRS.dist, family =
poisson(link = "log"), nAGQ = 0, na.action = na.fail)
Component models:
df logLik AICc delta weight
1267 8 -1829.48 3675.12 0.00 0.27
12367 9 -1828.48 3675.16 0.05 0.26
123567 10 -1827.90 3676.05 0.93 0.17
12567 9 -1828.97 3676.14 1.02 0.16
123467 10 -1828.06 3676.37 1.26 0.14
Term codes:
z.AgeF z.AgeM z.F_distance z.F_distance_QUA
1 2 3 4
z.M_distance z.scale.LD z.scale.LD_Quad
5 6 7
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 1.827949 0.017495 0.017519 104.342 <2e-16 ***
z.AgeF 0.046835 0.028607 0.028646 1.635 0.1021
z.AgeM -0.053181 0.029231 0.029271 1.817 0.0692 .
z.scale.LD -0.173759 0.178627 0.178871 0.971 0.3313
z.scale.LD_Quad 0.230000 0.176936 0.177178 1.298 0.1942
z.F_distance -0.030771 0.046648 0.046679 0.659 0.5098
z.M_distance -0.009227 0.020272 0.020288 0.455 0.6492
z.F_distance_QUA 0.009579 0.036307 0.036336 0.264 0.7921
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 1.82795 0.01749 0.01752 104.342 <2e-16 ***
z.AgeF 0.04683 0.02861 0.02865 1.635 0.1021
z.AgeM -0.05318 0.02923 0.02927 1.817 0.0692 .
z.scale.LD -0.17376 0.17863 0.17887 0.971 0.3313
z.scale.LD_Quad 0.23000 0.17694 0.17718 1.298 0.1942
z.F_distance -0.05378 0.05065 0.05070 1.061 0.2888
z.M_distance -0.02808 0.02686 0.02689 1.044 0.2964
z.F_distance_QUA 0.06716 0.07331 0.07341 0.915 0.3603
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
对于这个模型,我画出了以下曲线:
1.这里我首先得到预测值,然后绘制预测值和我感兴趣的变量的回归线,以及CI。假设我使用预测值,使用stat_smooth
和method="glm"
似乎可以,特别是因为我也想绘制CI。这对吗?(de code下面的图显示了我分别为男性和女性所做的绘图)
Model.avg.pred.dist.nb <- c(predict(nb.dist.avg.model, type = "response" )) #get the fitted values from the average model AICc<2
DF_Model.avg.pred.dist.nb <- data.frame(Model.avg.pred.dist.nb, subset(dataRS.dist, select = c(M_distance, F_distance, NbE))) #create the dataframe with the variables to plot
NbE_M_dist = ggplot(DF_Model.avg.pred.dist.nb, aes(x=M_distance, y=Model.avg.pred.dist.nb)) +
geom_point(size=0.5) +
stat_smooth(method = "glm", method.args = list(family = poisson), se = T, alpha = .25, color = "black") +
labs(x="Male distance", y = "Number eggs") +
theme_bw() +
theme(text=element_text(size=11))
1.我在geom_abline
中输入了截距和斜率值。如何同时添加CI?
ggplot() +
scale_x_continuous(name="Distance", limits=c(0,20)) +
scale_y_continuous(name="Number eggs", limits=c(5,7.5)) +
scale_color_discrete(name="Sex") +
geom_abline(aes(intercept=exp(1.827949241), slope=-0.030771247, colour="Female")) +
geom_abline(aes(intercept=exp(1.827949241), slope=-0.009227452, colour="Male")) +
theme_bw()
对于二项式模型,我这样做(取自here):
full.model.dist.bs = glmer(Breed_Suc ~ M_distance + M_distance_QUA + F_distance +
F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD + scale.LD_Quad +
(1|Year) + (1|MaleID)+ (1|FemaleID), data = dataRS.dist, na.action = na.fail ,
nAGQ = 0, weights = dataRS.dist$NbE,family = "binomial"(link="logit"))
summary(bs.dist.avg.model)
Call:
model.avg(object = best.bs.dist.models)
Component model call:
glmer(formula = Breed_Suc ~ <6 unique rhs>, data = dataRS.dist, family =
binomial(link = "logit"), nAGQ = 0, weights = dataRS.dist$NbE, na.action =
na.fail)
Component models:
df logLik AICc delta weight
12356789 12 -1727.73 3479.82 0.00 0.25
125678 10 -1729.89 3480.03 0.21 0.22
123456789 13 -1726.96 3480.34 0.51 0.19
1278 8 -1732.43 3481.03 1.21 0.13
1235789 11 -1729.52 3481.33 1.51 0.12
12345789 12 -1728.72 3481.80 1.98 0.09
Term codes:
z.AgeF z.AgeM z.F_distance
1 2 3
z.F_distance_QUA z.M_distance z.M_distance_QUA
4 5 6
z.scale.LD z.scale.LD_Quad z.F_distance:z.M_distance
7 8 9
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 0.59571 0.08768 0.08780 6.785 <2e-16 ***
z.F_distance -0.08331 0.22438 0.22455 0.371 0.7106
z.M_distance 0.30106 0.35661 0.35686 0.844 0.3989
z.M_distance_QUA -0.38521 0.37397 0.37420 1.029 0.3033
z.AgeF 0.23875 0.11164 0.11180 2.136 0.0327 *
z.AgeM 0.07179 0.10853 0.10868 0.661 0.5089
z.scale.LD -0.21892 0.51109 0.51179 0.428 0.6688
z.scale.LD_Quad -0.21826 0.50536 0.50605 0.431 0.6662
z.F_distance:z.M_distance 0.23584 0.22598 0.22610 1.043 0.2969
z.F_distance_QUA 0.10247 0.22963 0.22979 0.446 0.6556
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) 0.59571 0.08768 0.08780 6.785 <2e-16 ***
z.F_distance -0.12943 0.26879 0.26902 0.481 0.6304
z.M_distance 0.34787 0.36147 0.36176 0.962 0.3362
z.M_distance_QUA -0.58524 0.30888 0.30930 1.892 0.0585 .
z.AgeF 0.23875 0.11164 0.11180 2.136 0.0327 *
z.AgeM 0.07179 0.10853 0.10868 0.661 0.5089
z.scale.LD -0.21892 0.51109 0.51179 0.428 0.6688
z.scale.LD_Quad -0.21826 0.50536 0.50605 0.431 0.6662
z.F_distance:z.M_distance 0.36639 0.17748 0.17772 2.062 0.0392 *
z.F_distance_QUA 0.36356 0.30358 0.30400 1.196 0.2317
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coefs <- coef(bs.dist.avg.model)
x_plot <- seq(0, 20, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot) # coefs[1]= 0.5957089; coefs[2]=-0.08330994
y_plot2 <- plogis(coefs[1] + coefs[3] * x_plot) # coefs[1]= 0.5957089; coefs[3]=0.3010592
plot_data <- data.frame(x_plot, y_plot,y_plot2)
ggplot(plot_data) +
geom_line(aes(x_plot, y_plot), col = "red") +
geom_line(aes(x_plot, y_plot2), col = "blue") +
xlab("Distance") + ylab("Breeding success") +
scale_y_continuous(limits = c(0, 1)) +
theme_bw()
如果你有任何关于如何做不同的建议,这是非常欢迎的!提前感谢。
编辑:我希望是一个可复制的例子(如果不是,对不起,那么我不明白什么是可复制的例子)
1条答案
按热度按时间bnl4lu3b1#
方法1
我可以理解将预测值标绘为点,尽管显示原始数据点或 * 部分残差 *(例如remef package on GitHub)可能更有意义。通过预测值标绘泊松GLM有点奇怪:虽然通过预测值的GLM拟合可能与全模型的预测大致相同(尽管我很难证明这一点),但置信区间与平均模型预测的置信区间不同。
方法2
拟合对数链接(泊松)模型的预测值是
y = exp(a + b*x)
,而不是y=exp(a) + b*x
(看起来像您在这里所做的)。方法3
这看起来不错,尽管只有当所有其他协变量都是零中心时,它才能给出合理的预测(而且你不会得到置信区间,但有更简单的方法来做到这一点)。
您可以使用
ggeffects
包来获得预测值和置信区间,但需要注意的是,置信区间不包含随机效应中的任何不确定性(需要注意的是,我还没有研究它们是如何计算的,因此我不确定它们是如何传播模型平均不确定性的)。示例
设置数据和模型
我在模型中使用了交互作用项:据我所知,你为交互作用设置了显式虚拟变量。
预测和绘图
我尝试了
sjPlot
和ggeffects
包,但最终发现使用emmeans
更容易做到我想要的事情。