我的问题是图上显示的stat_regline_equation()中的intercept和slope与模型拟合ldply(modelfit, coef)中的intercept和slope不匹配，后者是我在图上所期望的。
以下是我们重现此问题的方法。

首先，我从一个玩具数据集开始。

library(lme4)
   data(Soybean)
   library(plyr)
   library(nlme)
   library(ggplot2)
   library(ggpubr)
   library(lemon)
head(Soybean)
       Plot   Variety Year Time weight  
1988.1 1988F1       F 1988   14  0.106 
1988.2 1988F1       F 1988   21  0.261 
1988.3 1988F1       F 1988   28  0.666  
1988.4 1988F1       F 1988   35  2.110  
1988.5 1988F1       F 1988   42  3.560  
1988.6 1988F1       F 1988   49  6.230

第二个，我按年份对数据集的每个子集拟合模型，其中Variety作为随机截距。

table(Soybean$Year)
1988 1989 1990 
 156  128  128 
 
table(Soybean$Variety)
  F   P 
204 208 
# Fit LME model for each subset(Soybean)
MxM1 <- dlply(Soybean,
                    "Year",
                    function(x)
                      lme(fixed = weight ~ Time,
                          random = ~ 1 |Variety ,
                           data=x, na.action=na.exclude))

这些是模型中随机截距（品种）的系数。2个品种，3年，6个不同的截距，这是有意义的。

ldply(MxM1, coef)
  Year (Intercept)  Time
1 1988       -8.65 0.349
2 1988       -7.73 0.349
3 1989       -6.71 0.242
4 1989       -4.04 0.242
5 1990       -7.28 0.309
6 1990       -6.63 0.309

第三，我估计你的体重

Soybean <- by(Soybean, Soybean$Year, \(x) {
  fit <- lme(fixed = weight ~ Time,
             random = ~ 1 |Variety ,
             data=x, na.action=na.exclude)
  cbind(x, yhat=predict(fit, na.action=na.exclude))
}) |> do.call(what=rbind)
         Plot Variety Year Time weight  yhat
1988.1 1988F1       F 1988   14  0.106 -3.76
1988.2 1988F1       F 1988   21  0.261 -1.32
1988.3 1988F1       F 1988   28  0.666  1.12
1988.4 1988F1       F 1988   35  2.110  3.57
1988.5 1988F1       F 1988   42  3.560  6.01
1988.6 1988F1       F 1988   49  6.230  8.45

第四个：我正在绘制散点图，x轴：时间，y轴：权重，一条基于yhat值的散点图的回归线。我明白了

ggplot(data = Soybean, 
       aes (x=Time, y = weight))+
  geom_point(size =3)+
  geom_line(aes(y=yhat))+
  stat_regline_equation()+
  theme_classic()+
  facet_rep_grid(Variety~Year)

我期望一条具有斜率和截距的回归线

ldply(MxM1, coef)
  Year (Intercept)  Time
1 1988       -8.65 0.349
2 1988       -7.73 0.349
3 1989       -6.71 0.242
4 1989       -4.04 0.242
5 1990       -7.28 0.309
6 1990       -6.63 0.309

因为yhat预测值基于ldply(MxM1, coef)的估计值，所以当我使用stat_regline_equation()函数时，我预期会出现ldply(MxM1, coef)的直线估计值，但这不是我看到的结果。我不确定为什么stat_regline_equation()忽略yhat的值，并使用lm函数绘制回归线？
任何关于如何绘制回归线与yhat和估计值（斜率和截距）从ldply(MxM1, coef)的建议是非常感谢。谢谢。

Soybean1988 <- Soybean[Soybean$Year == "1988",]
model_1988 <- lm(weight ~ Time * Variety, data = Soybean1988)
summary(model_1988)
Call:
lm(formula = weight ~ Time * Variety, data = Soybean1988)
Residuals:
Min 1Q Median 3Q Max
-4.5718 -2.1753 -0.6001 1.2478 10.3532
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.68423 0.71742 -10.711 <2e-16 ***
Time 0.32545 0.01474 22.085 <2e-16 ***
VarietyP -0.96067 1.00934 -0.952 0.3427
Time:VarietyP 0.04550 0.02052 2.218 0.0281 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.567 on 152 degrees of freedom
Multiple R-squared: 0.8855, Adjusted R-squared: 0.8833
F-statistic: 392 on 3 and 152 DF, p-value: < 2.2e-16

1条答案

按热度按时间

mctunoxg1#

这两个东西给你的信息不一样，这是有道理的，从stat_regline_equation()的文档来看，每一行都是用lm()估计的：

Add regression line equation and R^2 to a ggplot. Regression model is fitted using the function lm.

由于Variety和Year都是分面的，因此它会为六个分面中的每一个拟合一条单独的线。为了说明这一点，我们可以将1988年的数据进行子集化，并使用Time和Variety的交互作用拟合一个模型：

Soybean1988 <- Soybean[Soybean$Year == "1988",]
model_1988 <- lm(weight ~ Time * Variety, data = Soybean1988)
summary(model_1988)
Call:
lm(formula = weight ~ Time * Variety, data = Soybean1988)
Residuals:
    Min      1Q  Median      3Q     Max 
-4.5718 -2.1753 -0.6001  1.2478 10.3532 
Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -7.68423    0.71742 -10.711   <2e-16 ***
Time           0.32545    0.01474  22.085   <2e-16 ***
VarietyP      -0.96067    1.00934  -0.952   0.3427    
Time:VarietyP  0.04550    0.02052   2.218   0.0281 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.567 on 152 degrees of freedom
Multiple R-squared:  0.8855,    Adjusted R-squared:  0.8833 
F-statistic:   392 on 3 and 152 DF,  p-value: < 2.2e-16

如果我们看一下图，品种F的截距为-7.7，斜率为3.3。这与(Intercept)和Time中报告的信息相匹配。VarietyP是两个品种的平均值之间的变化，它等于图左上角的-7.7和左下角的-8.6之差，同样，Time:VarietyP是斜率之差。
我希望这有助于澄清您在此处发现的差异。

展开查看全部

赞(0）回复(0）举报 2023-03-20

r stat_regline_equation()截距和斜率问题

1条答案

相关问题

热门标签

最新问答