this is my code:
import numpy as np
import pandas as pd
import unicodeit
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
from pandas import read_csv
df1 = pd.read_csv('data.csv')
# CATBOOST
from catboost import CatBoostClassifier
from sklearn.model_selection import KFold
X=df1.drop('N', axis = 1)
y=list(df1['N'])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
# Set up k-fold cross-validation
k = 3
kf = KFold(n_splits=k, shuffle=True, random_state=42)
# Set up parameter grid for max_depth and learning_rate
max_depth_values = [4, 6, 8, 10, 12]
learning_rate_values = [0.01, 0.05, 0.1, 0.2, 0.3]
# Create an empty dataframe to store results
results_df = pd.DataFrame(columns=['max_depth', 'learning_rate', 'mean_accuracy'])
# Perform k-fold cross-validation with CatBoost
for max_depth in max_depth_values:
for learning_rate in learning_rate_values:
print(f"Training model with max_depth={max_depth} and learning_rate={learning_rate}")
accuracies = []
for train_index, val_index in kf.split(X, y):
X_train, X_val = X[train_index], X[val_index]
y_train, y_val = y[train_index], y[val_index]
model = CatBoostClassifier(max_depth=max_depth, learning_rate=learning_rate, iterations=100, random_seed=42, verbose=False)
model.fit(X_train, y_train, eval_set=(X_val, y_val))
accuracy = model.score(X_val, y_val)
accuracies.append(accuracy)
mean_accuracy = np.mean(accuracies)
new_row = {'max_depth': max_depth, 'learning_rate': learning_rate, 'mean_accuracy': mean_accuracy}
results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)
# Create a heatmap to visualize the results
results_pivot = results_df.pivot(index='max_depth', columns='learning_rate', values='mean_accuracy')
plt.figure(figsize=(10, 6))
sns.heatmap(results_pivot, annot=True, fmt=".3f", cmap="YlGnBu")
plt.title('Mean Accuracy Heatmap for CatBoost')
plt.show()
字符串
我的数据.csv
D t L fy fc N
168.8 2.64 305 302.4
168.8 2.64 305 302.4
169.3 2.62 305 338.1
34.1 1330
168.3 3.6 305 288.4 27
168.3 3.6 305
168.3 3.6 305
168.8 5 305 200.2 33.4 1966
168.8 5 305 200.2 33.4 1970
168.8 5 305 200.2 27.9 1984
3条答案
按热度按时间iyr7buue1#
在交叉验证循环中,将数据划分为X_train、X_瓦尔、y_train、y_瓦尔时出现错误。Pandas使用iloc按行和列进行索引,因此需要替换该部分。将交叉验证循环部分修复如下:
字符串
在交叉验证循环中替换该部分。这应该可以解决“[Index([2,3,4,6,7,9],dtype='int32')]中没有[Index([2,3,4,6,7,9],dtype='int32')]在[columns]中”的问题。不要忘记在初始train_test_split设置部分执行相同的操作。
范例:
型
确保您使用的数据与CatBoostClassifier所需的输入类型匹配。
sshcrbum2#
你有两个问题,一个是@aazizzailani提到的指数问题。按照他的建议将解决指数问题,另一个是你拥有的数据不足以进行模型拟合。
当我增加样本数据点时,得到了下面的输出。
x1c 0d1x的数据
密码,
字符串
jckbn6z73#
你应该加载数据并将其拆分为特征(X)和目标变量(y)。以下是我如何修改你的代码:
字符串