我对使用mlxtend
包和Keras
包还不太熟悉,所以请耐心听我说。我一直在尝试使用x1m5 n1,合并各种模型的预测,例如Random Forest
、Logistic Regression
和Neural Network
模型。我正在尝试堆叠这些在不同特征子集上操作的分类器。请参见以下代码。
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow import keras
from keras import layers
from keras.constraints import maxnorm
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Input
from mlxtend.classifier import StackingCVClassifier
from mlxtend.feature_selection import ColumnSelector
from sklearn.pipeline import make_pipeline
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.neural_network import MLPClassifier
X, y = make_classification()
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=0)
# defining neural network model
def create_model ():
# create model
model = Sequential()
model.add(Dense(10, input_dim=10, activation='relu'))
model.add(Dropout(0.2))
model.add(Flatten())
optimizer= keras.optimizers.RMSprop(lr=0.001)
model.add(Dense(units = 1, activation = 'sigmoid')) # Compile model
model.compile(loss='binary_crossentropy',
optimizer=optimizer, metrics=[keras.metrics.AUC(), 'accuracy'])
return model
# using KerasClassifier on the neural network model
NN_clf=KerasClassifier(build_fn=create_model, epochs=5, batch_size= 5)
NN_clf._estimator_type = "classifier"
# stacking of classifiers that operate on different feature subsets
pipeline1 = make_pipeline(ColumnSelector(cols=(np.arange(0, 5, 1))), LogisticRegression())
pipeline2 = make_pipeline(ColumnSelector(cols=(np.arange(5, 10, 1))), RandomForestClassifier())
pipeline3 = make_pipeline(ColumnSelector(cols=(np.arange(10, 20, 1))), NN_clf)
# final stacking
clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
clf.fit(X_train, y_train)
print("Stacking model score: %.3f" % clf.score(X_val, y_val))
但是,我得到这个错误:
ValueError Traceback (most recent call last)
<ipython-input-11-ef342536824f> in <module>
42 # final stacking
43 clf = StackingCVClassifier(classifiers=[pipeline1, pipeline2, pipeline3], meta_classifier=MLPClassifier())
---> 44 clf.fit(X_train, y_train)
45
46 print("Stacking model score: %.3f" % clf.score(X_val, y_val))
~\anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py in fit(self, X, y, groups, sample_weight)
282 meta_features = prediction
283 else:
--> 284 meta_features = np.column_stack((meta_features, prediction))
285
286 if self.store_train_meta_features:
~\anaconda3\lib\site-packages\numpy\core\overrides.py in column_stack(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\lib\shape_base.py in column_stack(tup)
654 arr = array(arr, copy=False, subok=True, ndmin=2).T
655 arrays.append(arr)
--> 656 return _nx.concatenate(arrays, 1)
657
658
~\anaconda3\lib\site-packages\numpy\core\overrides.py in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 3 dimension(s)
帮帮我。谢谢!
2条答案
按热度按时间ohfgkhjo1#
出现错误是因为您将传统ML模型和DL模型的预测结合在一起。
ML模型以
(80,1)
的形状给出预测,而DL模型以(80,1,1)
的形状给出预测,因此在尝试附加所有预测时存在失配。此问题的常见解决方法是去除DL方法给出的预测的额外维度,使其变为
(80,1)
而不是(80,1,1)
因此,打开位于以下位置的py文件:
anaconda3\lib\site-packages\mlxtend\classifier\stacking_cv_classification.py
在
if
块之外的第280和356行中,添加以下内容:所以,它看起来像这样:
ekqde3dh2#
Prakash's answer提出了非常好的观点。
如果您希望在不做太多更改的情况下运行该程序,可以滚动您自己版本的scikit-learn
BaseEstimator
/ClassifierMixin
对象,或者 Package 在推荐的KerasClassifier对象中。也就是说,您可以像这样滚动自己的估计器:
把所有的碎片放在一起,你就可以把预测叠加起来:
输出: