python-3.x 发现输入变量的样本数不一致:【一百,三百】

v6ylcynt  于 2022-11-26  发布在  Python
关注(0)|答案(2)|浏览(156)

我是这个领域的初学者,尝试按照逻辑回归对数据集进行建模。代码如下:

import numpy as np
from matplotlib import pyplot as plt
import pandas as pnd
from sklearn.preprocessing import Imputer, LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix

# Import the dataset
data_set = pnd.read_csv("/Users/Siddharth/PycharmProjects/Deep_Learning/Classification Template/Social_Network_Ads.csv")
X = data_set.iloc[:, [2,3]].values
Y = data_set.iloc[:, 4].values

# Splitting the set into training set and testing set
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=0)

# Scaling the variables
scaler_x = StandardScaler()
x_train = scaler_x.fit_transform(x_train)
x_train = scaler_x.transform(x_test)

# Fitting Linear Regression to training data
classifier = LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)

# Predicting the test set results
y_prediction = classifier.predict(x_test)

# Making the confusion matrix
conMat = confusion_matrix(y_true=y_test, y_pred=y_prediction)
print(conMat)

我得到的错误在classifier.fit(x_train, y_train)中。错误是:

Traceback (most recent call last):
  File "/Users/Siddharth/PycharmProjects/Deep_Learning/Logistic_regression.py", line 24, in <module>
    classifier.fit(x_train, y_train)
  File "/usr/local/lib/python3.6/site-packages/sklearn/linear_model/logistic.py", line 1173, in fit
    order="C")
  File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y
    check_consistent_length(X, y)
  File "/usr/local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [100, 300]

我不知道为什么会发生这种事。任何帮助都将不胜感激。谢谢!!

6pp0gazn

6pp0gazn1#

这里似乎有一处打字错误。您可能需要:

x_test = scaler_x.transform(x_test)

而不是:简而言之,错误基本上是说您的x_train(实际上是x_test)和y_train的大小不匹配。

8ehkhllq

8ehkhllq2#

在代码中也有其他情况会导致这个错误。有人建议我应该把交叉验证放在一个循环中,但是我不知道如何用代码来管理这个问题(也不知道操作的哪一部分应该放在循环中,以及如何写一个应该结束这个循环的条件)

X = train[feats].values
y = train['Target'].values

cv = StratifiedKFold(n_splits=3, random_state=2021, shuffle=True)
model = LogisticRegression(solver='liblinear')

scores = []
for train_idx, test_idx in cv.split(X, y):
    model.fit(X[train_idx], y[train_idx])
    y_pred = model.predict(X[test_idx])

    score = mean_absolute_error(y[test_idx], y_pred )
    scores.append(score)

print(np.mean(scores), np.std(scores))

fig = plt.figure(figsize=(15,6));
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)

skplt.metrics.plot_confusion_matrix(y, y_pred, ax = ax1)
skplt.metrics.plot_roc(y, y_pred, ax = ax2)

ValueError: Found input variables with inconsistent numbers of samples: [32561, 10853]

相关问题