使用k折交叉验证时出错(pytorch表格)

zi8p0yeb  于 2023-05-22  发布在  其他
关注(0)|答案(2)|浏览(221)

我正在使用k折交叉验证方法,但我得到一个错误。完整代码如下:
这里我将数据分为训练、测试和验证:

# set aside 20% of train and test data for evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y,
    test_size=0.2, shuffle = True, random_state = 8)

# Use the same function above for the validation set
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, 
    test_size=0.25, random_state= 8) # 0.25 x 0.8 = 0.2

代替pytorch表格的X_train和y_train,我们应该有如下train_data:

train_data = X_train.copy()
train_data.loc[:, 'target'] = y_train

test_data = X_test.copy()
test_data.loc[:, 'target'] = y_test

val_data = X_val.copy()
val_data.loc[:, 'target'] = y_val

下面是模型的配置:

data_config = DataConfig(
    target=['target'], #target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
    continuous_cols=train_data.columns.tolist(),
    categorical_cols=[],
    normalize_continuous_features=True
)
trainer_config = TrainerConfig(
    auto_lr_find=False,
    batch_size=512,
    max_epochs=50,
    # track_grad_norm=2,
    gradient_clip_val=10,
)
# experiment_config = ExperimentConfig(project_name="Tabular_test", log_logits=True)
optimizer_config = {'optimizer':'Adam', 'optimizer_params':{'weight_decay': 0, 'amsgrad': False}, 'lr_scheduler':None, 'lr_scheduler_params':{}, 'lr_scheduler_monitor_metric':'valid_loss'}

model_config = TabNetModelConfig(
    task="classification",
    n_d=10,
    n_a=15,
    n_steps=2,
    n_independent=2,
    n_shared=2,
    learning_rate=1e-3
)
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config=trainer_config,
)

mymodel = tabular_model.fit(train= train_data, validation= val_data )

现在我想使用k-fold,但当我运行以下代码时,我得到一个错误:

from sklearn.model_selection import  cross_val_score
scores = cross_val_score (mymodel , train_data , scoring = 'r2' , cv = 10)
scores

这是错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-93504b57425a> in <module>
      1 from sklearn.model_selection import  cross_val_score
----> 2 scores = cross_val_score (mymodel , train_data , scoring = 'r2' , cv = 10)
      3 scores

1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py in check_scoring(estimator, scoring, allow_none)
    448         raise TypeError(
    449             "estimator should be an estimator implementing 'fit' method, %r was passed"
--> 450             % estimator
    451         )
    452     if isinstance(scoring, str):

TypeError: estimator should be an estimator implementing 'fit' method, None was passed

当我跑步的时候:

from sklearn.model_selection import  cross_val_score
tabular_model.fit(train= train_data, validation= val_data )
scores = cross_val_score (tabular_model, train_data , scoring = 'r2' , cv = 10)
scores

我得到以下错误:

Empty                                     Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
    821             try:
--> 822                 tasks = self._ready_batches.get(block=False)
    823             except queue.Empty:

7 frames
Empty: 

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/sklearn/base.py in clone(estimator, safe)
     78                     "it does not seem to be a scikit-learn "
     79                     "estimator as it does not implement a "
---> 80                     "'get_params' method." % (repr(estimator), type(estimator))
     81                 )
     82 

TypeError: Cannot clone object '<pytorch_tabular.tabular_model.TabularModel object at 0x7f46dac439d0>' (type <class 'pytorch_tabular.tabular_model.TabularModel'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.
njthzxwz

njthzxwz1#

tabular_model.fit返回None(https://github.com/manujosephv/pytorch_table/blob/0544fba3c173c5d2bf5153ef189243ff6e0a186f/pytorch_table/tabular_model.py#L394)
为了使用cross_val_score,类应该实现一些方法,如fitget_params。这个类没有所有的方法。因此,为了将它用于cross_val_score,您可以扩展这个类并添加所需的方法,或者为cross validation使用其他方法。

vxbzzdmp

vxbzzdmp2#

从v1.0开始,PyTorch Tabular也有一个低级API,可用于交叉验证工作流。例如,here

相关问题