我正在尝试使用randomizedcv和gridsearchcv提高随机森林分类器的准确性。每当我运行脚本时,它都会给我输出,但是如果我再次运行脚本,它就不会显示相同的输出。例如:如果在脚本上首次显示在我的终端上
Accuracy for Random Forests without tuning: 0.7545454545454545
Accuracy for Random Forests by Randomized CV: 0.7272727272727273
Accuracy for Random Forests by Gridsearch CV: 0.7363636363636363
如果我再次运行脚本,则会得到不同的输出。
Accuracy for Random Forests without tuning: 0.7636363636363637
Accuracy for Random Forests by Randomized CV: 0.7454545454545455
Accuracy for Random Forests by Gridsearch CV: 0.7454545454545455
这是同样的代码
X = df1[['ANSWER', 'TEXT']] # dependent
y = df1['ID'] # independent
# split dataset in train and test set.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # 70% training and 30% test data
# Random Forest Classifier
rf = RandomForestClassifier(n_estimators=100)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print('Accuracy for Random Forests without tuning:', accuracy_score(y_test, y_pred))
为什么每次我运行脚本时都会出现波动?
我还试图提高它的精度,只是为了随机变异,它给了我最好的参数,如果我在我的模型上使用这些参数,那么不是提高精度,而是降低精度,即如果精度是0.7636,我使用最好的参数来提高它,那么在类似的行中,我得到的精度是0.69或0.71。以下是randomizedcv的代码。
n_estimators = [int(x) for x in range(200, 2000, 200)]
max_features = ['auto', 'sqrt']
max_depth = [int(x) for x in np.linspace(10, 110, num=11)]
max_depth.append(None)
min_samples_split = [2, 5, 10]
min_samples_leaf = [1, 2, 4]
bootstrap = [True, False]
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf,
'bootstrap': bootstrap
}
print(random_grid)
hyp_rf = RandomForestClassifier()
hyp_rf_random = RandomizedSearchCV(estimator=hyp_rf, param_distributions=random_grid, n_iter=100, cv=3, verbose=2, random_state=42, n_jobs=-1)
hyp_rf_random.fit(X_train, y_train)
prophecy = hyp_rf_random.predict(X_test)
print('Tuned Random Forests:', accuracy_score(y_test, prophecy))
print('Best parameters', hyp_rf_random.best_params_)
暂无答案!
目前还没有任何答案,快来回答吧!