我正在用一个简单的任务测试我的LSTM编码器-解码器架构:来识别随机字符序列中的元音。我的tsv数据看起来像这样:
molteyhpr 010011000
dlkz 0000
fabgovmgg 010010000
qgvowdykl 000100100
kgncpiot 00000110
pisvdf 010000
字符串
我已经生成了10万个样本。
我的型号代码:(keras example的稍微修改版本)
self.latent_dim = 256
enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
enc_lstm_layer = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)
# We discard 'enc_outputs' and only keep the states.
enc_states = [state_h, state_c]
# Set up the decoder, using 'enc_states' as initial state.
dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)
dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
dec_outputs = dec_dense_layer(dec_outputs)
# Define the model that will turn
# 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'
model = Model([enc_input_layer, dec_input_layer], dec_outputs)
型
所有数据都转换为等长的独热表示。它是这样生成的:
def _generator(self, enc_data, dec_data, is_training):
enc_oh_input_batch = None
dec_oh_input_batch = None
dec_oh_output_batch = None
enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]
current_idx = 0
samples_len = len(enc_data)
while True:
# Create zero batch arrays
enc_oh_input_batch = np.zeros(
(self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
dec_oh_input_batch = np.zeros(
(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
dec_oh_output_batch = np.zeros(
(self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
# Compile batch
for i in range(self.batch_size):
# when we get to the end of samples - start over
if i + current_idx >= samples_len:
current_idx = 0
if is_training:
self.epoch += 1
tokens_in = enc_data[i + current_idx]
tokens_out = dec_data[i + current_idx]
# vectorize encoder input
for t, token in enumerate(tokens_in):
enc_oh_input_batch[i, t, token] = 1
enc_oh_input_batch[i, t + 1:, enc_space_token] = 1
# vectorize decoder input and output
for t, token in enumerate(tokens_out):
dec_oh_input_batch[i, t, token] = 1
if t > 0:
# self.dec_oh_output will be ahead by one timestep
# and will not include the start character.
dec_oh_output_batch[i, t - 1, token] = 1
dec_oh_input_batch[i, t + 1:, dec_space_token] = 1
current_idx += self.batch_size
yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]
型
我是这样训练它的:
h = self.model.fit(self.source.train_generator(),
batch_size = self.conf.batch_size,
epochs = self.conf.epochs,
initial_epoch = self.source.epoch,
steps_per_epoch = batches_per_epoch,
validation_steps = batches_per_epoch,
validation_data = self.source.validation_generator(),
validation_freq = self.conf.validation_freq
)
型
使用这些设置:
epochs = 10
validation_freq = 10
validation_split = 0.2
batch_size = 30
loss = 'categorical_crossentropy'
metrics = ['accuracy']
optimizer = {
'name' : 'Adam',
'learning_rate' : 0.0001,
}
型
我试着尝试学习率,批量大小,不同的优化器类型,但无论什么训练都会卡住:
Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630
型
我做错了什么?
1条答案
按热度按时间gdrx4gfi1#
在原始的逐字符翻译任务中,解码器输入和目标数据被移位一个时间步长,因为解码器需要基于当前和过去的字符来预测下一个字符。
但是,在您的任务中,目标是将输入中的每个字符直接Map到输出中的字符。因此,不需要移动目标数据。
我已经改变了for循环,其中encoder_input_data和decoder_target_data被预处理。
尝试使用这个:
字符串