Keras根据图表预测数量,完全不准确[重复]

zzoitvuj  于 2022-11-24  发布在  其他
关注(0)|答案(1)|浏览(178)

此问题在此处已有答案

What function defines accuracy in Keras when the loss is mean squared error (MSE)?(3个答案)
两个月前关门了。
我是一个普通网络世界的新手,我尝试用tensorflow /角速度来写一个预测算法。这段代码只是试图根据一个图来预测一个依赖于Alt和Temp的ROC。
(Not但可以在此处显示图表。)
经过多次尝试,我得到了一些准确度,大约0.2到0.5。不是很好,但我至少得到了一些工作。过了一段时间,它下降到0,无论我调整,它没有给予我任何准确度。你知道为什么我不会得到任何准确度吗?

#import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd
import sklearn.model_selection

#Data collection
factor = 10
data = pd.read_csv("roc_6800_ibf.csv", sep=",")
data = data.apply(pd.to_numeric, errors='coerce')
data = (data / factor) + 5

predict = "Roc"

x = np.array(data.drop([predict], axis=1))
y = np.array(data[predict])

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, 
test_size=0.2)

x_shape = int(x.ndim)
y_shape = int(y.ndim)

#Model

model = keras.Sequential([
keras.layers.Dense(units=(2), input_shape=(2,), activation="relu"),
keras.layers.Dense(4, activation="relu"),
keras.layers.Dense(1, activation="relu")
])

model.compile(optimizer="adam", loss="MeanSquaredError", metrics=["accuracy"])

model.fit(x_train, y_train, epochs=20, batch_size=10, verbose=1)

results = model.evaluate(x_test, y_test)

print("- - - - - - - - - - - - - - - - - - - - - - - -")
print(results)

#Prediction

def dataPredict(inputvalues, outputvalues):
    print("- - - - - - - - - - - - - - - - - - - - - - - -")
    test_q = np.array([inputvalues])
    test_a = outputvalues
    prediction = model.predict((test_q / factor) + 5)

    print("Prediction " + str((prediction[0] - 5) * factor))
    print("Actual " + str(test_a[0]))
    print("Input " + str(test_q))

dataPredict([5.5,20.0],[3.6])
dataPredict([6.8,30.0],[0.4])

我的indata大约是从图表中取的点的80行,看起来像这样。我想用Alt和Temp来得到Roc。
更新了数据集,72行:

Alt,Temp,Roc
-1.0,-40.0,9.6
0.0,-40.0,9.6
1.0,-40.0,9.6
2.0,-40.0,9.6
3.0,-40.0,9.6
4.0,-40.0,9.6
5.0,-40.0,9.6
6.0,-40.0,9.6
7.0,-40.0,8.1
8.0,-40.0,7.9
7.5,-40.0,9.1
-1.0,0.0,9.6
0.0,0.0,9.6
1.0,0.0,9.6
2.0,0.0,9.6
2.1,0.0,9.6
3.0,0.0,9.0
4.0,0.0,8.0
5.0,0.0,6.6
6.0,0.0,5.5
7.0,0.0,4.2
8.0,0.0,3.2
-1.0,20.0,9.6
0.0,20.0,9.6
0.5,20.0,9.0
1.0,20.0,8.6
2.0,20.0,7.8
3.0,20.0,6.2
4.0,20.0,5.2
5.0,20.0,4.0
6.0,20.0,2.9
7.0,20.0,1.8
8.0,20.0,0.5
-1.0,40.0,7.5
0.0,40.0,6.8
1.0,40.0,5.6
2.0,40.0,4.2
3.0,40.0,3.2
4.0,40.0,2.2
5.0,40.0,1.0
-1.0,50.0,5.4
0.0,50.0,4.2
-0.5,-40.0,9.5
0.5,-40.0,9.5
1.5,-40.0,9.5
2.5,-40.0,9.5
3.5,-40.0,9.5
4.5,-40.0,9.5
5.5,-40.0,9.5
6.5,-40.0,9.1
7.5,-40.0,8.1
-0.5,-10.0,9.5
0.5,-10.0,9.5
1.5,-10.0,9.5
2.5,-10.0,9.5
3.5,-10.0,9.5
4.5,-10.0,8.3
5.5,-10.0,7.1
6.5,-10.0,6.0
7.5,-10.0,5.0
-0.5,30.0,8.4
0.5,30.0,7.6
1.5,30.0,6.4
2.5,30.0,5.5
3.5,30.0,4.2
4.5,30.0,3.1
5.5,30.0,1.9
6.5,30.0,0.8
7.5,30.0,-0.5
5.2,10.0,5.3
6.8,10.0,4.0

我试着在代码中调整数据集(indata),使所有数字都是假的,并将它们除以10,然后我得到了迄今为止最好的结果,但突然它就降到了0

Epoch 20/20
6/6 [==============================] - 0s 2ms/step - loss: 32.5049 - accuracy: 0.0000e+00
qyswt5oh

qyswt5oh1#

好吧,我试着在您的数据集(TLDR:XGBoost在这种情况下工作得更好)
现在我看了一下数据集,由于这是一个回归任务,您的精度为0,并且您的输出是一个连续的数字,而不是[0或1]的形式。因此,预测输出的匹配几乎为0,因此精度为0。评估这类任务的更好方法是使用不同的损失函数,如MAE、MSE、RMSE、MAPE,对于精度,您可以使用R平方。
下面是代码:

import pandas as pd
import numpy as np
import seaborn as sns
import collections
import xgboost
from sklearn.linear_model import LinearRegression

df = pd.read_csv("sample_data_1.csv") # Your dataset

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df[['Alt','Temp']], df['Roc'], test_size=0.3)

因此,首先我在您的数据上拟合了一个线性模型,因为数据条目和复杂性看起来非常简单

lin_model = LinearRegression()
lin_model.fit(x_train, y_train)
preds = lin_model.predict(x_test)

from sklearn.metrics import r2_score
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.6826956688194117'

如您所见,线性模型的准确性较低,但现在可以确定输入与输出以某种方式相关。
接下来我尝试了一个Keras模型类似于你,代码如下:

import tensorflow as tf
import tensorflow.keras.layers as layers

model = tf.keras.Sequential([
    layers.Dense(1000, activation = 'relu', input_shape = (2, )),
    layers.Dropout(0.2),
    layers.Dense(500, activation = 'relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation = 'relu')
])

model.compile(optimizer = 'adam', loss = 'mape', metrics=['mape','mae','mse'])
model.fit(x_train, y_train, epochs = 100, batch_size = 16)
model.evaluate(x_test, y_test)
Output: 1/1 [==============================] - 0s 130ms/step - loss: 53.3907 - mape: 53.3907 - mae: 2.6886 - mse: 15.3293

这里的结果很差,因为损失几乎是50%,但如果你看到平均误差,在量级上它不是很多。
这意味着如果使用scikit-learn的预处理库中的MinMaxScaler()来缩小模型的规模,那么模型的表现会更好。
最后,我实现了一个XGBoost模型,它的性能比其他模型好得多:

xgb_clf = xgboost.XGBRegressor(
    learning_rate=0.3,
    max_depth=6,
    n_estimators=1000
)
xgb_clf.fit(x_train, y_train)
preds = xgb_clf.predict(x_test)
"Accuracy is " + str(r2_score(preds, y_test))
Output: 'Accuracy is 0.8968514145069562'

几乎90%。记住数据的基本状态和最少的预处理,如果使用适当的处理和增强,XGBoost模型可以在准确性上有5到6%的良好提高。
干杯!干杯!

相关问题