R语言 绘制回归线-全平均GLMM

yks3o0rb  于 2022-12-20  发布在  其他
关注(0)|答案(1)|浏览(281)

我拟合了两个GLMM模型(完整模型见下文),一个采用泊松分布(卵数),另一个采用二项分布(存活后代数/卵数),然后我检索了完整平均系数(总结见下文)。
我想为一些固定项(来自平均模型)绘制回归线和置信区间,但我真的不确定我是否做得正确。
下面是我所做的,我注意到#1和#2有点不同。鉴于它们来自同一个模型,有人能给我一些提示吗?
我还想知道,我的模型是标准化的这一事实是否会影响我绘制回归线的方式?
这是我的df的样子:

> dput(head(dataRS.dist, 10)) 
structure(list(MaleID = c("875626", "871031", 
"871031", "877334", "871895", "877906", "870404", "861285", "861285", "873932"
), FemaleID = c("861255", "861255", "861255", "861255", "861503", 
"861503", "861525", "861525", "861525", "861621"), Year = c(1996L, 1997L, 
1998L, 1999L, 1997L, 2000L, 1996L, 1997L, 1998L, 1998L), AgeM = c(1L, 
5L, 6L, 2L, 2L, 1L, 3L, 3L, 4L, 3L), AgeF = c(2L, 3L, 4L, 5L, 3L, 6L, 3L, 
4L, 5L, 3L), LayingDate = c(165L, 122L, 99L, 102L, 116L, 112L, 106L, 162L, 
106L, 98L), LD_QUA = c(27225L, 14884L, 9801L, 10404L, 13456L, 12544L, 
11236L, 26244L, 11236L, 9604L), Breed_Suc = c(1, 0.833, 1, 1, 0.667, 0.875, 
0.6, 0.8, 0.6, 0.667), NbE = c(7L, 6L, 4L, 3L, 6L, 8L, 5L, 5L, 5L, 6L), 
M_distance = c(5.011, 14.146, 14.146, 12.011, 8.36, 11.236, 11.135, 11.236, 
11.236, 6.989), M_distance_QUA = c(25.112, 200.111, 200.111, 144.27, 69.882, 
126.247, 123.984, 126.247, 126.247, 48.843), F_distance = c(1, 1, 1, 1, 
6.438, 6.438, 10.371, 10.371, 10.371, 14.079), F_distance_QUA = c(1, 1, 1, 1, 
41.45, 41.45, 107.553, 107.553, 107.553, 198.208), scale.LD = structure(c(1.49006909403421, 
0.231226641474467, -0.44210769361563, -0.354281475995183, 0.0555742062335718, 
-0.061527417260358, -0.237179852501253, 1.40224287641377, -0.237179852501253, 
-0.471383099489113), .Dim = c(10L, 1L)), scale.LD_Quad = structure(c(1.50346602030487, 
0.0806268589946397, -0.505410857311973, -0.43588877528701, -0.084012499432636, 
-0.189160324982829, -0.339964443206132, 1.39036293163739, -0.339964443206132, 
-0.528123709980161), .Dim = c(10L, 1L))), row.names = c(9L, 10L, 11L, 12L, 19L, 
21L, 22L, 23L, 24L, 28L), class = "data.frame")

泊松分布模型:

full.model.dist.nb = glmer(NbE ~ M_distance + M_distance_QUA + F_distance + 
    F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD +
    scale.LD_Quad + (1|Year) + (1|MaleID) + (1|FemaleID), data = dataRS.dist, 
    na.action = na.fail, nAGQ = 0, family = "poisson"(link="log"))

慕敏::疏通()......

summary(nb.dist.avg.model)

Call:
model.avg(object = best.nb.dist.models)

Component model call: 
glmer(formula = NbE ~ <5 unique rhs>, data = dataRS.dist, family = 
     poisson(link = "log"), nAGQ = 0, na.action = na.fail)

Component models: 
       df   logLik    AICc delta weight
1267    8 -1829.48 3675.12  0.00   0.27
12367   9 -1828.48 3675.16  0.05   0.26
123567 10 -1827.90 3676.05  0.93   0.17
12567   9 -1828.97 3676.14  1.02   0.16
123467 10 -1828.06 3676.37  1.26   0.14

Term codes: 
           z.AgeF           z.AgeM     z.F_distance z.F_distance_QUA 
                1                2                3                4 
     z.M_distance       z.scale.LD  z.scale.LD_Quad 
                5                6                7 

Model-averaged coefficients:  
(full average) 
                  Estimate Std. Error Adjusted SE z value Pr(>|z|)    
(Intercept)       1.827949   0.017495    0.017519 104.342   <2e-16 ***
z.AgeF            0.046835   0.028607    0.028646   1.635   0.1021    
z.AgeM           -0.053181   0.029231    0.029271   1.817   0.0692 .  
z.scale.LD       -0.173759   0.178627    0.178871   0.971   0.3313    
z.scale.LD_Quad   0.230000   0.176936    0.177178   1.298   0.1942    
z.F_distance     -0.030771   0.046648    0.046679   0.659   0.5098    
z.M_distance     -0.009227   0.020272    0.020288   0.455   0.6492    
z.F_distance_QUA  0.009579   0.036307    0.036336   0.264   0.7921    
 
(conditional average) 
                  Estimate Std. Error Adjusted SE z value Pr(>|z|)    
(Intercept)       1.82795    0.01749     0.01752 104.342    <2e-16 ***
z.AgeF            0.04683    0.02861     0.02865   1.635    0.1021    
z.AgeM           -0.05318    0.02923     0.02927   1.817    0.0692 .  
z.scale.LD       -0.17376    0.17863     0.17887   0.971    0.3313    
z.scale.LD_Quad   0.23000    0.17694     0.17718   1.298    0.1942    
z.F_distance     -0.05378    0.05065     0.05070   1.061    0.2888    
z.M_distance     -0.02808    0.02686     0.02689   1.044    0.2964    
z.F_distance_QUA  0.06716    0.07331     0.07341   0.915    0.3603    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

对于这个模型,我画出了以下曲线:
1.这里我首先得到预测值,然后绘制预测值和我感兴趣的变量的回归线,以及CI。假设我使用预测值,使用stat_smoothmethod="glm"似乎可以,特别是因为我也想绘制CI。这对吗?(de code下面的图显示了我分别为男性和女性所做的绘图)

Model.avg.pred.dist.nb <- c(predict(nb.dist.avg.model, type = "response" )) #get the fitted values from the average model AICc<2
        DF_Model.avg.pred.dist.nb <- data.frame(Model.avg.pred.dist.nb, subset(dataRS.dist, select = c(M_distance, F_distance, NbE))) #create the dataframe with the variables to plot
        NbE_M_dist = ggplot(DF_Model.avg.pred.dist.nb, aes(x=M_distance, y=Model.avg.pred.dist.nb)) +
          geom_point(size=0.5) + 
          stat_smooth(method = "glm", method.args = list(family = poisson), se = T, alpha = .25, color = "black") +
          labs(x="Male distance", y = "Number eggs") + 
          theme_bw() +
          theme(text=element_text(size=11))

1.我在geom_abline中输入了截距和斜率值。如何同时添加CI?

ggplot() +
  scale_x_continuous(name="Distance", limits=c(0,20)) +
  scale_y_continuous(name="Number eggs", limits=c(5,7.5)) +
  scale_color_discrete(name="Sex") +
  geom_abline(aes(intercept=exp(1.827949241), slope=-0.030771247, colour="Female")) + 
  geom_abline(aes(intercept=exp(1.827949241), slope=-0.009227452, colour="Male")) +
  theme_bw()

对于二项式模型,我这样做(取自here):

full.model.dist.bs = glmer(Breed_Suc ~ M_distance + M_distance_QUA + F_distance + 
    F_distance_QUA + M_distance*F_distance + AgeF + AgeM + scale.LD + scale.LD_Quad + 
    (1|Year) + (1|MaleID)+ (1|FemaleID), data = dataRS.dist, na.action = na.fail , 
    nAGQ = 0, weights = dataRS.dist$NbE,family = "binomial"(link="logit"))
summary(bs.dist.avg.model)

Call:
model.avg(object = best.bs.dist.models)

Component model call: 
glmer(formula = Breed_Suc ~ <6 unique rhs>, data = dataRS.dist, family = 
     binomial(link = "logit"), nAGQ = 0, weights = dataRS.dist$NbE, na.action = 
     na.fail)

Component models: 
          df   logLik    AICc delta weight
12356789  12 -1727.73 3479.82  0.00   0.25
125678    10 -1729.89 3480.03  0.21   0.22
123456789 13 -1726.96 3480.34  0.51   0.19
1278       8 -1732.43 3481.03  1.21   0.13
1235789   11 -1729.52 3481.33  1.51   0.12
12345789  12 -1728.72 3481.80  1.98   0.09

Term codes: 
                   z.AgeF                    z.AgeM              z.F_distance
                        1                         2                         3
         z.F_distance_QUA              z.M_distance          z.M_distance_QUA
                        4                         5                         6 
               z.scale.LD           z.scale.LD_Quad z.F_distance:z.M_distance
                        7                         8                         9 

Model-averaged coefficients:  
(full average) 
                           Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept)                0.59571    0.08768     0.08780   6.785    <2e-16 ***
z.F_distance              -0.08331    0.22438     0.22455   0.371    0.7106
z.M_distance               0.30106    0.35661     0.35686   0.844    0.3989
z.M_distance_QUA          -0.38521    0.37397     0.37420   1.029    0.3033
z.AgeF                     0.23875    0.11164     0.11180   2.136    0.0327 *  
z.AgeM                     0.07179    0.10853     0.10868   0.661    0.5089   
z.scale.LD                -0.21892    0.51109     0.51179   0.428    0.6688 
z.scale.LD_Quad           -0.21826    0.50536     0.50605   0.431    0.6662   
z.F_distance:z.M_distance  0.23584    0.22598     0.22610   1.043    0.2969 
z.F_distance_QUA           0.10247    0.22963     0.22979   0.446    0.6556 
 
(conditional average) 
                           Estimate Std. Error Adjusted SE z value Pr(>|z|) 
(Intercept)                0.59571    0.08768     0.08780   6.785    <2e-16 ***
z.F_distance              -0.12943    0.26879     0.26902   0.481    0.6304  
z.M_distance               0.34787    0.36147     0.36176   0.962    0.3362    
z.M_distance_QUA          -0.58524    0.30888     0.30930   1.892    0.0585 .  
z.AgeF                     0.23875    0.11164     0.11180   2.136    0.0327 *  
z.AgeM                     0.07179    0.10853     0.10868   0.661    0.5089    
z.scale.LD                -0.21892    0.51109     0.51179   0.428    0.6688    
z.scale.LD_Quad           -0.21826    0.50536     0.50605   0.431    0.6662    
z.F_distance:z.M_distance  0.36639    0.17748     0.17772   2.062    0.0392 *  
z.F_distance_QUA           0.36356    0.30358     0.30400   1.196    0.2317                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
coefs <- coef(bs.dist.avg.model)
x_plot <- seq(0, 20, by = 0.1)
y_plot <- plogis(coefs[1] + coefs[2] * x_plot) # coefs[1]=  0.5957089; coefs[2]=-0.08330994
y_plot2 <- plogis(coefs[1] + coefs[3] * x_plot) # coefs[1]=  0.5957089; coefs[3]=0.3010592
plot_data <- data.frame(x_plot, y_plot,y_plot2)

ggplot(plot_data) + 
  geom_line(aes(x_plot, y_plot), col = "red") + 
  geom_line(aes(x_plot, y_plot2), col = "blue") + 
  xlab("Distance") + ylab("Breeding success") +
  scale_y_continuous(limits = c(0, 1)) + 
  theme_bw()

如果你有任何关于如何做不同的建议,这是非常欢迎的!提前感谢。
编辑:我希望是一个可复制的例子(如果不是,对不起,那么我不明白什么是可复制的例子)

bnl4lu3b

bnl4lu3b1#

方法1

我可以理解将预测值标绘为点,尽管显示原始数据点或 * 部分残差 *(例如remef package on GitHub)可能更有意义。通过预测值标绘泊松GLM有点奇怪:虽然通过预测值的GLM拟合可能与全模型的预测大致相同(尽管我很难证明这一点),但置信区间与平均模型预测的置信区间不同。

方法2

拟合对数链接(泊松)模型的预测值是y = exp(a + b*x),而不是y=exp(a) + b*x(看起来像您在这里所做的)。

方法3

这看起来不错,尽管只有当所有其他协变量都是零中心时,它才能给出合理的预测(而且你不会得到置信区间,但有更简单的方法来做到这一点)。
您可以使用ggeffects包来获得预测值和置信区间,但需要注意的是,置信区间不包含随机效应中的任何不确定性(需要注意的是,我还没有研究它们是如何计算的,因此我不确定它们是如何传播模型平均不确定性的)。

示例

设置数据和模型

我在模型中使用了交互作用项:据我所知,你为交互作用设置了显式虚拟变量。

set.seed(101)
n <- 1000
ng <- 20
dd <- data.frame(x = rnorm(n),
                 y = rnorm(n),
                 sex = factor(rep(c("F","M"), length.out = n)),
                 f = factor(rep(1:ng, length.out = n)))
library(lme4)
library(MuMIn)
dd$resp <- simulate(~ x*sex + y + (1|f),
                    newdata = dd,
                    newparams = list(beta = rep(1,5),
                                     theta = 1),
                    family = poisson)[[1]]

full_model <- glmer(resp  ~ x*sex + y + (1|f), data = dd,
                    family = poisson, na.action = na.fail)

dr <- dredge(full_model)
avg_model <- model.avg(dr, fit = TRUE)

预测和绘图

我尝试了sjPlotggeffects包,但最终发现使用emmeans更容易做到我想要的事情。

library(emmeans)
g1 <- emmeans(avg_model, specs = ~x*sex,
              type = "response",
              data = dd,
              at = list(x = seq(-4, 4, length.out = 101)))
g2 <- as.data.frame(g1)

library(ggplot2); theme_set(theme_bw())
gg0 <- ggplot(dd, aes(x, resp, colour = sex)) +
    ## scale_y_log10(limits = c(0.1, 1e4)) + 
    geom_point() +
    geom_line(data = g2, aes(y = rate)) +
    geom_ribbon(data = g2, aes(
                               y = rate, ymin = lower.CL, ymax = upper.CL, fill = sex),
                colour = NA, alpha = 0.3)
 
print(gg0)
print(gg0 + scale_y_log10(limits=c(0.1, 1e4), oob = scales::squish))

相关问题