背景:我的数据的形状为(batch_size,ghost_dim,data_length),我正在学习训练基于MultiHeadAttention的模型。我使用model()、model.predict()或model.predict_on_batch()函数输入一批相同维度的数据,以在拟合模型后返回预测数组,并且我在每个数据上都得到了类似的形状相关错误。
简单介绍一下我的模型:
import tensorflow.keras as keras
import tensorflow.keras.layers as layers
keras.mixed_precision.set_global_policy("mixed_float16")
x_train = np.random.normal(size=(32, 512))
y_train = np.random.normal(size=(32, 512))
batch_size = 32
H, W = x_train.shape
rows, cols = np.indices((H, W), sparse=True)
embed_dim = 512
ghost_dim = 1
dense_dim = 2048
num_heads = 2
x_train = np.expand_dims(x_train, 1)
shape = (batch_size, ghost_dim, embed_dim) #(32, 1, 512)
decoder_inputs = layers.Input(batch_input_shape=shape, dtype=tensorflow.float16)
mha_1 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
mha_2 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)
layernorm_1 = layers.LayerNormalization()
Z = decoder_inputs
Z = mha_1(query=Z, value=Z, key=Z, use_causal_mask=True, attention_mask=padding_mask)
Z = layernorm_1(Z + decoder_inputs)
Z = mha_2(query=Z, value=decoder_inputs, key=decoder_inputs, attention_mask=padding_mask)
outputs = layers.TimeDistributed(keras.layers.Dense(embed_dim, activation="softmax"))(Z)
model = keras.Model(decoder_inputs, outputs)
model.compile(loss="mean_squared_error", optimizer="rmsprop", metrics=["accuracy"])
history = model.fit(x_train, y_train, epochs=200)
投掷:
2 frames
/usr/local/lib/python3.10/dist-packages/keras/engine/training.py in tf__predict_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2169, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2155, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2143, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2111, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
**ValueError: Exception encountered when calling layer 'layer_normalization_1' (type LayerNormalization).
Cannot take the length of shape with unknown rank.
Call arguments received by layer 'layer_normalization_1' (type LayerNormalization):
• inputs=tf.Tensor(shape=<unknown>, dtype=float16)**
如果我直接使用model()函数调用,则在MultiHeadAttention层期间输入形状出现类似错误,输入显示为一维,而不是我输入的原始单批数据,即:
x_predict = np.random.normal(size=(1, 1, 512))
predictions = self.model(x_predict)
predictions = self.model.predict_on_batch(x_predict)
不确定所有这些numpy数组形状错误的起源是什么。有什么建议/想法吗?
1条答案
按热度按时间wrrgggsh1#
问题在于在输入层形状中指定批次:
然后你的模型将始终假设你的训练和测试数据将由定义的batch_size=32形成。因此,batch_size=1的x_predict将给予错误。我建议从输入层中删除batch_size:
或者,如果您可以对x_predict数据进行采样,使其始终具有(32,1,512)的形状。