R语言 Tidymodels故障排除-“警告消息:因子中有新的水平:不适用”

oknwwptz  于 2023-03-27  发布在  其他
关注(0)|答案(1)|浏览(271)

我正在penguins数据集上尝试tidymodels。我想构建一个配方,然后比较不同的估算方法(下面的例子中的knn)。我在尝试构建模型时得到以下错误:

Warning message:
There are new levels in a factor: NA

我已经尝试了不同的解决方案(使用step_novel(),step_unknown(),step_naomit()),但似乎都不起作用。唯一起作用的是,如果我在创建配方之前删除/处理所有丢失的数据,但这违背了对配方使用预处理的目的,对吗?下面是完整的代码。

# penguins dataset tidymodels

# libraries
library(tidyverse)
library(tidymodels)
library(workflowsets)
library(skimr)
library(DataExplorer)
library(SmartEDA)
library(dlookr)
library(dataMaid)
library(GGally)

# import
data <- penguins

# split
set.seed(1) 
data_split <- data %>% initial_split(prop = 0.75, strata = species)
data_train <- training(data_split)
data_test <- testing(data_split)

# model recipe
recipe <- recipe(species ~ ., data = data_train) %>% 
  step_log(all_numeric_predictors()) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  step_corr(all_numeric_predictors(), threshold = 0.9) %>%  
  step_zv(all_numeric_predictors()) %>% 
  step_nzv(all_numeric_predictors()) %>% 
  step_dummy(all_nominal_predictors())

# recipe with knn imputing
recipe_knn_impute <- recipe %>% 
  step_impute_knn(all_predictors())

# Apply processing to test and training data
baked_data_train <- recipe_knn_impute %>% prep() %>% bake(data_train)
baked_data_test <- recipe_knn_impute %>% prep() %>% bake(data_test)
fruv7luv

fruv7luv1#

We suggest doing imputation first;否则所有其他操作都会受到丢失数据的影响。此警告可能来自虚拟变量创建(因为此时它们仍然丢失)。
如果你先归咎于,警告就会消失:

# penguins dataset tidymodels

# libraries
library(tidyverse)
library(tidymodels)
library(workflowsets)

# import
data <- penguins

# split
set.seed(1) 
data_split <- data %>% initial_split(prop = 0.75, strata = species)
data_train <- training(data_split)
data_test <- testing(data_split)

# model recipe
recipe_knn_impute <- 
  recipe(species ~ ., data = data_train) %>% 
  step_impute_knn(all_predictors()) %>% 
  step_log(all_numeric_predictors()) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  step_corr(all_numeric_predictors(), threshold = 0.9) %>%  
  step_zv(all_numeric_predictors()) %>% 
  step_nzv(all_numeric_predictors()) %>% 
  step_dummy(all_nominal_predictors())

# Apply processing to test and training data
baked_data_train <- recipe_knn_impute %>% prep() %>% bake(data_train)
baked_data_test <- recipe_knn_impute %>% prep() %>% bake(data_test)

创建于2023-03-22带有reprex v2.0.2

相关问题