如何在for循环中使用predict.lm?

eufgjt7s  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(95)

我需要使用线性回归。由于每个预测器分别添加到模型中,所以我应该使用for循环来拟合模型。

set.seed(98274)                          # Creating example data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.2 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.1 * y
x3 <- rnorm(1000) - 0.1 * x1 + 0.3 * x2 - 0.3 * y
data <- data.frame(y, x1, x2, x3)
head(data)                               # Head of data

mod_summaries <- list()                  # Create empty list

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  mod_summaries[[i - 1]] <- summary(     # Store regression model summary in list
    lm(y ~ ., data[ , c("y", predictors_i)]))
  
}

字符串
然后,我尝试在另一个for循环中使用这些模型来预测测试数据。下面提供了我的代码。

## Test
set.seed(44)                          # Creating test data
y <- rnorm(1000)
x1 <- rnorm(1000) + 0.19 * y
x2 <- rnorm(1000) + 0.2 * x1 + 0.11 * y
x3 <- rnorm(1000) - 0.12 * x1 + 0.28 * x2 - 0.33 * y
test <- data.frame(y, x1, x2, x3)

predict_models <- matrix(nrow = nrow(test), ncol = 3)

for(i in 2:ncol(data)) {                 # Head of for-loop
  
  predictors_i <- colnames(data)[2:i]    # Create vector of predictor names
  predict_models[,i-1] <- predict.lm(mod_summaries[[i-1]], test[,2:i])
  
}
predict_models


但它抛出了以下错误:

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : 
  'data' must be a data.frame, environment, or list
In addition: Warning message:
In predict.lm(mod_summaries[[i - 1]], test[, 2:i]) :
  calling predict.lm(<fake-lm-object>) ...

w8f9ii69

w8f9ii691#

首先,您希望只存储模型,而不是摘要。

mod_summaries <- vector('list', ncol(data) - 1L)  ## preallocate list of known length, it's way more efficient

for (i in seq_len(ncol(data))[-1]) {
  predictors_i <- colnames(data)[2:i]
  mod_summaries[[i - 1]] <- lm(y ~ ., data[, c("y", predictors_i)])
}

字符串
然后,predict的数据实际上不会改变,只使用模型中的列,因此使用test就足够了。

predict_models <- matrix(nrow=nrow(test), ncol=ncol(test) - 1L)
for (i in seq_len(ncol(data))[-1]) {
  predict_models[, i - 1] <- predict.lm(mod_summaries[[i - 1]], test)
}


就是这样

head(predict_models)
#              [,1]        [,2]       [,3]
# [1,] -0.115690784 -0.19149611 -0.4815419
# [2,] -0.004721430  0.03814865  0.1894562
# [3,] -0.110812904  0.02312155  0.2579051
# [4,]  0.004264032 -0.06147035 -0.2328833
# [5,]  0.320110168 -0.04145044 -0.3229186
# [6,] -0.040603638  0.01977484 -0.1090088

**或者,**和更多的R-ish,你可以在两行代码中做同样的事情,而没有for循环。

ms <- lapply(seq_along(data)[-1], \(i) lm(reformulate(names(data)[2:i], 'y'), data))
pm <- sapply(ms, predict, test)
head(pm)
#           [,1]        [,2]       [,3]
# 1 -0.115690784 -0.19149611 -0.4815419
# 2 -0.004721430  0.03814865  0.1894562
# 3 -0.110812904  0.02312155  0.2579051
# 4  0.004264032 -0.06147035 -0.2328833
# 5  0.320110168 -0.04145044 -0.3229186
# 6 -0.040603638  0.01977484 -0.1090088

相关问题