pandas 如何为机器学习模型有效地解析日期字段列

cgvd09ve 于 2023-05-15 发布在其他

关注(0)|答案(1)|浏览(95)

我正试图预测IBM的股票价格。但是我在处理线性回归算法中用于模型训练的日期列字段上有gottchas。我的数据集看起来是这样的：

Date      Open      High       Low     Close  Adj Close  Volume
0  1962-01-02  7.713333  7.713333  7.626667  7.626667   0.618153  387200
1  1962-01-03  7.626667  7.693333  7.626667  7.693333   0.623556  288000
2  1962-01-04  7.693333  7.693333  7.613333  7.616667   0.617343  256000
3  1962-01-05  7.606667  7.606667  7.453333  7.466667   0.605185  363200
4  1962-01-08  7.460000  7.460000  7.266667  7.326667   0.593837  544000

我的代码是：

from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import TimeSeriesSplit
from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np

df = pd.read_csv('IBM.csv')

df['Date'] = pd.to_datetime(df.Date)
df.set_index('Date', inplace=True)

X = df.drop('Adj Close', axis='columns')
Y = df['Adj Close']
scaler = MinMaxScaler()

X = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)

timesplit= TimeSeriesSplit(n_splits=10)
for train_index, test_index in timesplit.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = Y[train_index], Y[test_index]

我得到一个错误：

KeyError: "None of [Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,\n            ...\n            1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330, 1331, 1332],\n           dtype='int64', length=1333)] 
are in the [columns]"

即使我设法让它工作，我也无法训练我的模型。

pandas

来源：https://stackoverflow.com/questions/76218324/how-do-i-parse-a-date-field-column-effectively-for-a-machine-learning-model