python 在自动sklearn中,运行状态为StatusType.CRASHED的虚拟预测失败

x33g5p2x  于 2023-02-07  发布在  Python
关注(0)|答案(1)|浏览(118)

我尝试使用自动sklearn在虹膜数据集上训练一个简单的分类模型。
当我试着拟合我的模型时,我总是得到下面的错误,

ValueError: (' Dummy prediction failed with run state StatusType.CRASHED and additional output: {\'traceback\': \'Traceback (most recent call last):\\n  File "/home/minura/anaconda3/envs/auto-sklearn/lib/python3.10/site-packages/autosklearn/evaluation/__init__.py", line 55, in fit_predict_try_except_decorator\\n    return ta(queue=queue, **kwargs)\\n  File "/home/minura/anaconda3/envs/auto-sklearn/lib/python3.10/site-packages/autosklearn/evaluation/train_evaluator.py", line 1407, in eval_cv\\n    evaluator.fit_predict_and_loss(iterative=iterative)\\n  File "/home/minura/anaconda3/envs/auto-sklearn/lib/python3.10/site-packages/autosklearn/evaluation/train_evaluator.py", line 597, in fit_predict_and_loss\\n    train_loss = {\\n  File "/home/minura/anaconda3/envs/auto-sklearn/lib/python3.10/site-packages/autosklearn/evaluation/train_evaluator.py", line 599, in <dictcomp>\\n    [train_losses[i][str(metric)] for i in range(self.num_cv_folds)],\\n  File "/home/minura/anaconda3/envs/auto-sklearn/lib/python3.10/site-packages/autosklearn/evaluation/train_evaluator.py", line 599, in <listcomp>\\n    [train_losses[i][str(metric)] for i in range(self.num_cv_folds)],\\nKeyError: \\\'average_precision\\\'\\n\', \'error\': "KeyError(\'average_precision\')", \'configuration_origin\': \'DUMMY\'}.',)

我到底做错了什么?
这是我的完整代码

import pandas as pd
import category_encoders as ce
from autosklearn.classification import AutoSklearnClassifier
from sklearn.model_selection import train_test_split, StratifiedKFold
from autosklearn.metrics import (accuracy,
                                 f1,
                                 roc_auc,
                                 precision,
                                 average_precision,
                                 recall,
                                 log_loss)
  

df = pd.read_csv('iris.csv')

df['variety'] = df['variety'].astype('category')

y = df.pop('variety')
X = df.copy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1, stratify=y)

skf = StratifiedKFold(n_splits=5)
  
clf = AutoSklearnClassifier(time_left_for_this_task=600,
                            max_models_on_disc=5,
                            memory_limit = 10240,
                            resampling_strategy=skf,
                            ensemble_size = 3,
                            metric = average_precision,
                            scoring_functions=[roc_auc, average_precision, accuracy, f1, precision, recall, log_loss])    

clf.fit(X = X_train, y = y_train)

我对目标变量的编码方式有什么问题吗?我也尝试了以下方法,

df['variety'] = df['variety'].apply(pd.Categorical)
vawmfj5a

vawmfj5a1#

我相信您已超出内存限制:
尝试将代码修改为类似于以下内容:

clf = AutoSklearnClassifier(time_left_for_this_task=600,
                            max_models_on_disc=5,
                            memory_limit = 102400,
                            resampling_strategy=skf,
                            ensemble_size = 3,
                            metric = average_precision,
                            scoring_functions=[roc_auc, average_precision, accuracy, f1, precision, recall, log_loss])

相关问题