pandas 如何解决“[Index([2,3,4,6,7,9],dtype='int32')]都不在[columns]中"?

dl5txlt9  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(104)
this is my code:
import numpy as np
import pandas as pd
import unicodeit
import seaborn as sns 
import matplotlib.pyplot as plt

from matplotlib.ticker import FormatStrFormatter
from pandas import read_csv
df1 = pd.read_csv('data.csv')
# CATBOOST
from catboost import CatBoostClassifier
from sklearn.model_selection import KFold

X=df1.drop('N', axis = 1)
y=list(df1['N'])

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)

# Set up k-fold cross-validation
k = 3
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Set up parameter grid for max_depth and learning_rate
max_depth_values = [4, 6, 8, 10, 12]
learning_rate_values = [0.01, 0.05, 0.1, 0.2, 0.3]

# Create an empty dataframe to store results
results_df = pd.DataFrame(columns=['max_depth', 'learning_rate', 'mean_accuracy'])

# Perform k-fold cross-validation with CatBoost
for max_depth in max_depth_values:
    for learning_rate in learning_rate_values:
        print(f"Training model with max_depth={max_depth} and learning_rate={learning_rate}")
        accuracies = []
        for train_index, val_index in kf.split(X, y):
            X_train, X_val = X[train_index], X[val_index]
            y_train, y_val = y[train_index], y[val_index]

            model = CatBoostClassifier(max_depth=max_depth, learning_rate=learning_rate, iterations=100, random_seed=42, verbose=False)
            model.fit(X_train, y_train, eval_set=(X_val, y_val))
            accuracy = model.score(X_val, y_val)
            accuracies.append(accuracy)

        mean_accuracy = np.mean(accuracies)
        new_row = {'max_depth': max_depth, 'learning_rate': learning_rate, 'mean_accuracy': mean_accuracy}
        results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)
       
# Create a heatmap to visualize the results
results_pivot = results_df.pivot(index='max_depth', columns='learning_rate', values='mean_accuracy')
plt.figure(figsize=(10, 6))
sns.heatmap(results_pivot, annot=True, fmt=".3f", cmap="YlGnBu")
plt.title('Mean Accuracy Heatmap for CatBoost')
plt.show()

字符串
我的数据.csv
D t L fy fc N
168.8 2.64 305 302.4
168.8 2.64 305 302.4
169.3 2.62 305 338.1
34.1 1330
168.3 3.6 305 288.4 27
168.3 3.6 305
168.3 3.6 305
168.8 5 305 200.2 33.4 1966
168.8 5 305 200.2 33.4 1970
168.8 5 305 200.2 27.9 1984

iyr7buue

iyr7buue1#

在交叉验证循环中,将数据划分为X_train、X_瓦尔、y_train、y_瓦尔时出现错误。Pandas使用iloc按行和列进行索引,因此需要替换该部分。将交叉验证循环部分修复如下:

for train_index, val_index in kf.split(X, y):
     X_train, X_val = X.iloc[train_index], X.iloc[val_index]
     y_train, y_val = y.iloc[train_index], y.iloc[val_index]

字符串
在交叉验证循环中替换该部分。这应该可以解决“[Index([2,3,4,6,7,9],dtype='int32')]中没有[Index([2,3,4,6,7,9],dtype='int32')]在[columns]中”的问题。不要忘记在初始train_test_split设置部分执行相同的操作。
范例:

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

#...

for max_depth in max_depth_values:
     for learning_rate in learning_rate_values:
         print(f"Training model with max_depth={max_depth} and learning_rate={learning_rate}")
         accuracies = []
         for train_index, val_index in kf.split(X, y):
             X_train, X_val = X.iloc[train_index], X.iloc[val_index]
             y_train, y_val = y.iloc[train_index], y.iloc[val_index]

             model = CatBoostClassifier(max_depth=max_depth, learning_rate=learning_rate, iterations=100, random_seed=42, verbose=False)
             model.fit(X_train, y_train, eval_set=(X_val, y_val))
             accuracy = model.score(X_val, y_val)
             accuracies.append(accuracy)

         mean_accuracy = np.mean(accuracies)
         new_row = {'max_depth': max_depth, 'learning_rate': learning_rate, 'mean_accuracy': mean_accuracy}
         results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)


确保您使用的数据与CatBoostClassifier所需的输入类型匹配。

sshcrbum

sshcrbum2#

你有两个问题,一个是@aazizzailani提到的指数问题。按照他的建议将解决指数问题,另一个是你拥有的数据不足以进行模型拟合。
当我增加样本数据点时,得到了下面的输出。
x1c 0d1x的数据
密码,

import numpy as np
import pandas as pd
import seaborn as sns 
import matplotlib.pyplot as plt

from matplotlib.ticker import FormatStrFormatter
from pandas import read_csv
df1 = pd.read_csv('data.csv', sep=" ")
# CATBOOST
from catboost import CatBoostClassifier
from sklearn.model_selection import KFold
X=df1.drop('N', axis = 1)
y=df1['N']

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 42)
k = 3
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Set up parameter grid for max_depth and learning_rate
max_depth_values = [4, 6, 8, 10, 12]
learning_rate_values = [0.01, 0.05, 0.1, 0.2, 0.3]

# Create an empty dataframe to store results
results_df = pd.DataFrame(columns=['max_depth', 'learning_rate', 'mean_accuracy'])
# Perform k-fold cross-validation with CatBoost
for max_depth in max_depth_values:
    for learning_rate in learning_rate_values:
        print(f"Training model with max_depth={max_depth} and learning_rate={learning_rate}")
        accuracies = []
        for train_index, val_index in kf.split(X, y):
            print(train_index, val_index)
            print(X, y)
            X_train, X_val = X.iloc[train_index], X.iloc[val_index]
            y_train, y_val = y.iloc[train_index], y.iloc[val_index]

            model = CatBoostClassifier(max_depth=max_depth, learning_rate=learning_rate, iterations=100, random_seed=42, verbose=False)
            model.fit(X_train, y_train, eval_set=(X_val, y_val))
            accuracy = model.score(X_val, y_val)
            accuracies.append(accuracy)

        mean_accuracy = np.mean(accuracies)
        new_row = {'max_depth': max_depth, 'learning_rate': learning_rate, 'mean_accuracy': mean_accuracy}
        results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)
       
# Create a heatmap to visualize the results
results_pivot = results_df.pivot(index='max_depth', columns='learning_rate', values='mean_accuracy')
plt.figure(figsize=(10, 6))
sns.heatmap(results_pivot, annot=True, fmt=".3f", cmap="YlGnBu")
plt.title('Mean Accuracy Heatmap for CatBoost')
plt.show()

字符串

jckbn6z7

jckbn6z73#

你应该加载数据并将其拆分为特征(X)和目标变量(y)。以下是我如何修改你的代码:

import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from catboost import CatBoostClassifier
import matplotlib.pyplot as plt
import seaborn as sns

# Load your data from the CSV file
data = pd.read_csv('your_data.csv')  

X = data.drop('fy', axis=1)  #
y = data['fy']

k = 3
kf = KFold(n_splits=k, shuffle=True, random_state=42)

max_depth_values = [4, 6, 8, 10, 12]
learning_rate_values = [0.01, 0.05, 0.1, 0.2, 0.3]

# Create an empty dataframe to store results
results_df = pd.DataFrame(columns=['max_depth', 'learning_rate', 'mean_accuracy'])

for max_depth in max_depth_values:
    for learning_rate in learning_rate_values:
        print(f"Training model with max_depth={max_depth} and learning_rate={learning_rate}")
        accuracies = []
        for train_index, val_index in kf.split(X, y):
            X_train, X_val = X.iloc[train_index], X.iloc[val_index]
            y_train, y_val = y.iloc[train_index], y.iloc[val_index]

            model = CatBoostClassifier(max_depth=max_depth, learning_rate=learning_rate, iterations=100, random_seed=42, verbose=False)
            model.fit(X_train, y_train, eval_set=(X_val, y_val))
            accuracy = model.score(X_val, y_val)
            accuracies.append(accuracy)

        mean_accuracy = np.mean(accuracies)
        new_row = {'max_depth': max_depth, 'learning_rate': learning_rate, 'mean_accuracy': mean_accuracy}
        results_df = pd.concat([results_df, pd.DataFrame([new_row])], ignore_index=True)
       

results_pivot = results_df.pivot(index='max_depth', columns='learning_rate', values='mean_accuracy')
plt.figure(figsize=(10, 6))
sns.heatmap(results_pivot, annot=True, fmt=".3f", cmap="YlGnBu")
plt.title('Mean Accuracy Heatmap for CatBoost')
plt.show()

字符串

  • 只需确保将'fy'替换为实际的目标变量名,并相应地调整文件路径。*

相关问题