keras LSTM编码器-解码器停留在平台期,无法学习

mkh04yzy  于 2023-08-06  发布在  其他
关注(0)|答案(1)|浏览(121)

我正在用一个简单的任务测试我的LSTM编码器-解码器架构:来识别随机字符序列中的元音。我的tsv数据看起来像这样:

molteyhpr   010011000
dlkz        0000
fabgovmgg   010010000
qgvowdykl   000100100
kgncpiot    00000110
pisvdf      010000

字符串
我已经生成了10万个样本。
我的型号代码:(keras example的稍微修改版本)

self.latent_dim = 256

        enc_input_layer = Input(name="enc_input", shape=(None, self.source.enc_vocab_len))
        enc_lstm_layer  = LSTM(self.latent_dim, name="enc_lstm", return_state=True)
        enc_outputs, state_h, state_c = enc_lstm_layer(enc_input_layer)

        # We discard 'enc_outputs' and only keep the states.
        enc_states = [state_h, state_c]

        # Set up the decoder, using 'enc_states' as initial state.
        dec_input_layer = Input(name="dec_input", shape=(None, self.source.dec_vocab_len))

        # We set up our decoder to return full output sequences,
        # and to return internal states as well. We don't use the
        # return states in the training model, but we will use them in inference.
        dec_lstm_layer = LSTM(self.latent_dim, name="dec_lstm", return_sequences=True, return_state=True)

        dec_outputs, _, _ = dec_lstm_layer(dec_input_layer, initial_state=enc_states)
        dec_dense_layer = Dense(self.source.dec_vocab_len, name="dec_dense", activation='softmax')
        dec_outputs = dec_dense_layer(dec_outputs)

        # Define the model that will turn
        # 'encoder_input_data' & 'decoder_input_data' into 'decoder_target_data'

        model = Model([enc_input_layer, dec_input_layer], dec_outputs)


所有数据都转换为等长的独热表示。它是这样生成的:

def _generator(self, enc_data, dec_data, is_training):
        enc_oh_input_batch  = None
        dec_oh_input_batch  = None
        dec_oh_output_batch = None

        enc_space_token = self.enc_vocab[self.TOKEN_EMPTY]
        dec_space_token = self.dec_vocab[self.TOKEN_EMPTY]

        current_idx = 0
        samples_len = len(enc_data)

        while True:
            # Create zero batch arrays
            enc_oh_input_batch = np.zeros(
                (self.batch_size, self.enc_max_seq_len, self.enc_vocab_len), dtype='int8')
            dec_oh_input_batch = np.zeros(
                (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')
            dec_oh_output_batch = np.zeros(
                (self.batch_size, self.dec_max_seq_len, self.dec_vocab_len), dtype='int8')

            # Compile batch
            for i in range(self.batch_size):
                # when we get to the end of samples - start over
                if i + current_idx >= samples_len:
                    current_idx = 0
                    if is_training:
                        self.epoch += 1

                tokens_in  = enc_data[i + current_idx]
                tokens_out = dec_data[i + current_idx]

                # vectorize encoder input
                for t, token in enumerate(tokens_in):
                    enc_oh_input_batch[i, t, token] = 1
                enc_oh_input_batch[i, t + 1:, enc_space_token] = 1

                # vectorize decoder input and output
                for t, token in enumerate(tokens_out):
                    dec_oh_input_batch[i, t, token] = 1
                    if t > 0:
                        # self.dec_oh_output will be ahead by one timestep
                        # and will not include the start character.
                        dec_oh_output_batch[i, t - 1, token] = 1
                    dec_oh_input_batch[i, t + 1:, dec_space_token] = 1

            current_idx += self.batch_size

            yield [[enc_oh_input_batch, dec_oh_input_batch], dec_oh_output_batch]


我是这样训练它的:

h = self.model.fit(self.source.train_generator(),
            batch_size       = self.conf.batch_size,
            epochs           = self.conf.epochs,
            initial_epoch    = self.source.epoch,
            steps_per_epoch  = batches_per_epoch,
            validation_steps = batches_per_epoch,
            validation_data  = self.source.validation_generator(),
            validation_freq  = self.conf.validation_freq
        )


使用这些设置:

epochs           = 10
validation_freq  = 10
validation_split = 0.2
batch_size       = 30
loss             = 'categorical_crossentropy'
metrics          = ['accuracy']
optimizer = {
    'name'          : 'Adam',
    'learning_rate' : 0.0001,
}


我试着尝试学习率,批量大小,不同的优化器类型,但无论什么训练都会卡住:

Training model ...
Epoch 1/10
33/33 [==============================] - 6s 90ms/step - loss: 0.1759 - accuracy: 0.4380
Epoch 2/10
33/33 [==============================] - 3s 91ms/step - loss: 0.1370 - accuracy: 0.4533
Epoch 3/10
33/33 [==============================] - 3s 90ms/step - loss: 0.1258 - accuracy: 0.4634
Epoch 4/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1220 - accuracy: 0.4602
Epoch 5/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4602
Epoch 6/10
33/33 [==============================] - 3s 92ms/step - loss: 0.1218 - accuracy: 0.4625
Epoch 7/10
33/33 [==============================] - 3s 94ms/step - loss: 0.1208 - accuracy: 0.4643
Epoch 8/10
33/33 [==============================] - 3s 93ms/step - loss: 0.1202 - accuracy: 0.4619
Epoch 9/10
33/33 [==============================] - 3s 95ms/step - loss: 0.1199 - accuracy: 0.4601
Epoch 10/10
33/33 [==============================] - 5s 149ms/step - loss: 0.1207 - accuracy: 0.4630 - val_loss: 0.1195 - val_accuracy: 0.4630


我做错了什么?

gdrx4gfi

gdrx4gfi1#

在原始的逐字符翻译任务中,解码器输入和目标数据被移位一个时间步长,因为解码器需要基于当前和过去的字符来预测下一个字符。
但是,在您的任务中,目标是将输入中的每个字符直接Map到输出中的字符。因此,不需要移动目标数据。
我已经改变了for循环,其中encoder_input_data和decoder_target_data被预处理。
尝试使用这个:

for t, char in enumerate(target_text):
   decoder_input_data[i, t, target_token_index[char]] = 1.
   decoder_target_data[i, t, target_token_index[char]] = 1.

字符串

相关问题