matplotlib 预测曲线在sklearn多项式回归中未正确出现[重复]

sc4hvdpw  于 2023-10-24  发布在  其他
关注(0)|答案(1)|浏览(138)

此问题已在此处有答案

Given parallel lists, how can I sort one while permuting (rearranging) the other in the same way?(16个回答)
data points connected in wrong order in line graph(1个答案)
2小时前关闭
我使用sklearn的多项式回归创建了一个基于温度的电力需求预测模型。然而,当我完成学习后,当我用matplotlib.pyplot绘制图表时,出现了以下形状。

我想要一个只有一条曲线的模型。有什么问题,我应该怎么做?这里是完整的代码。

import pandas as pd
dt = pd.read_csv("complete_dataset.csv")

dt.isnull().sum()

dt = dt.dropna()

dt.head()

dt = dt[["demand", "solar_exposure", "max_temperature","rainfall"]]

dt.head()

### Correlation between sun exposure and electricity demand --> weak

x = dt.iloc[:, 0].values
y = dt.iloc[:, 1].values

import matplotlib.pyplot as plt
plt.scatter(x, y, s = 2, color = "black")
plt.xlabel("demand")
plt.ylabel("solar exposure")

### Correlation between maximum temperature and electricity demand --> Demand tends to increase as it decreases or increases.

y = dt.iloc[:, 2]
plt.scatter(x, y, s = 1, color = "black")
plt.xlabel("demand")
plt.ylabel("max temperature")

### There appears to be no correlation between rainfall and electricity demand.

y = dt.iloc[:, 3].values
plt.scatter(x, y, s = 2, color = "black")
plt.xlabel("demand")
plt.ylabel("rainfall")

dt = dt[["demand", "max_temperature"]]
dt.rename(columns={'max_temperature': 'temp'}, inplace=True)

## model

x = dt["demand"].values.reshape(-1, 1)
y = dt["temp"].values

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(x)
X_poly[:5]

poly_reg.get_feature_names_out()

lin_reg = LinearRegression()
lin_reg.fit(X_poly,y)

plt.scatter(x, y, color = "black", s = 2)
plt.plot(x, lin_reg.predict(poly_reg.fit_transform(x)), color = 'red')
plt.xlabel("demand")
plt.ylabel("max temperature")
plt.show()

### Problem: The lines come out strangely because they are split sideways.
### Solution: Should we change the x-axis and y-axis to make a V-shape?

x = dt["demand"].values.reshape(-1, 1)
y = dt["temp"].values.reshape(-1, 1)

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
y_poly = poly_reg.fit_transform(y)
y_poly[:5]

poly_reg.get_feature_names_out()

lin_reg = LinearRegression()
lin_reg.fit(y_poly,x)

plt.scatter(y, x, color = "black", s = 2)
plt.plot(y, lin_reg.predict(poly_reg.fit_transform(y)), color = 'red')
plt.ylabel("demand")
plt.xlabel("max temperature")
plt.show()
of1yzvn4

of1yzvn41#

这是因为plot()假设数据点是有序的。曲线实际上是连接的点,因为它们不是按照预期的顺序,matplotlib将这些点连接起来,导致你看到的混乱。
你只需要在拟合模型后对数据点进行排序:

# After fitting the model
sorted_indices = y.argsort(axis=0) # This gets the indices that would sort the array
sorted_y = y[sorted_indices].ravel()
sorted_predictions = lin_reg.predict(poly_reg.fit_transform(sorted_y.reshape(-1, 1)))

# Now, plot using these sorted values
plt.scatter(y, x, color="black", s=2)
plt.plot(sorted_y, sorted_predictions, color='red')
plt.ylabel("demand")
plt.xlabel("max temperature")
plt.show()

这将按预期显示一条平滑曲线。

相关问题