我有一个关于Keras中model.evaluate()和model.predict()函数的问题。我在Keras中构建了一个简单的LSTM模型,并希望在测试数据集上测试模型性能。我考虑了以下两种方法来计算测试数据集上的度量:
- 使用model.evaluate()方法
- 使用model.predict()方法获得拟合值并手动计算度量
然而,我最终得到了不同的结果。此外,model.evaluate()方法的结果也取决于batch_size参数的值。根据我的理解和这篇文章,它们应该有相同的结果。下面是可以复制结果的代码:
import tensorflow as tf
from keras.models import Model
from keras.layers import Dense, LSTM, Activation, Input
import numpy as np
from tqdm.notebook import tqdm
import keras.backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping
class VLSTM:
def __init__(self, input_shape=(6, 1), nb_output_units=1, nb_hidden_units=128, dropout=0.0,
recurrent_dropout=0.0, nb_layers=1):
self.input_shape = input_shape
self.nb_output_units = nb_output_units
self.nb_hidden_units = nb_hidden_units
self.nb_layers = nb_layers
self.dropout = dropout
self.recurrent_dropout = recurrent_dropout
def build(self):
inputs = Input(shape=self.input_shape)
outputs = LSTM(self.nb_hidden_units)(inputs)
outputs = Dense(1, activation=None)(outputs)
return Model(inputs=[inputs], outputs=[outputs])
def RMSE(output, target):
return K.sqrt(K.mean((output - target) ** 2))
n_train = 500
n_val = 100
n_test = 250
X_train = np.random.rand(n_train, 6, 1)
Y_train = np.random.rand(n_train, 1)
X_val = np.random.rand(n_val, 6, 1)
Y_val = np.random.rand(n_val, 1)
X_test = np.random.rand(n_test, 6, 1)
Y_test = np.random.rand(n_test, 1)
input_shape = (X_train.shape[1], X_train.shape[2])
model = VLSTM(input_shape=input_shape)
m = model.build()
m.compile(loss=RMSE,
optimizer='adam',
metrics=[RMSE])
callbacks = []
callbacks.append(EarlyStopping(patience=30))
# train model
hist = m.fit(X_train, Y_train, \
batch_size=32, epochs=10, shuffle=True, \
validation_data=(X_val, Y_val), callbacks=callbacks)
# Use evaluate method with default batch size
test_mse = m.evaluate(X_test, Y_test)[1]
print("Mse is {} using evaluate method with default batch size".format(test_mse))
# Use evaluate method with batch size 1
test_mse = m.evaluate(X_test, Y_test, batch_size=1)[1]
print("Mse is {} using evaluate method with batch size = 1".format(test_mse))
# Use evaluate method with batch size = n_test
test_mse = m.evaluate(X_test, Y_test, batch_size=n_test)[1]
print("Mse is {} using evaluate method with batch size = n_test".format(test_mse))
# Use pred method and compute RMSE mannually
Y_test_pred = m.predict(X_test)
test_mse = np.sqrt( ((Y_test_pred - Y_test) ** 2).mean())
print("Mse is {} using evaluate method with batch size = 1".format(test_mse))
运行代码后,结果如下:
使用默认批量评估方法,MSE为0.3068242073059082
使用批量= 1的评估方法,MSE为0.26647186279296875
使用批量= n_test的评估方法,Mse为0.30763307213783264
用预测方法计算均方误差为0.3076330596820157
看起来使用mode.predict()和model.evaluate()与batch size = n_test给出了相同的结果。有人能解释一下吗?提前感谢!
1条答案
按热度按时间7hiiyaii1#
是的,你的猜测是正确的,用predict计算的MSE确实等于用batch_size=len(dataset)计算的MSE。
这很容易理解,因为当你用predict计算MSE时,你没有把数据集分成几批来计算,你只是一次计算。
显然,你可以用predict来计算MSE,也可以像这样划分批次:
其输出为:0.28436336682976376现在与评价:
其输出为:0.28436335921287537所以基本上它们是一样的。
如果你尝试使用np.split(Y_test_pred,250,axis=0),这使你的批量大小为1,在我的例子中输出是0.24441334738835332。而使用evaluate batch_size=1,输出是0.244413360953331。所以你可以看到它是一样的。