keras 开始微调时的损失高于迁移学习的损失

4zcjmb1e  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(137)

由于我是用迁移学习得到的权重开始微调的,所以我希望损失是相同或更少的。然而,看起来它是用一组不同的起始权重开始微调的。
开始迁移学习的代码:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                              include_top=False, 
                                              weights='imagenet')
base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(units=3, activation='sigmoid')
])

model.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

epochs = 1000
callback = tf.keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)
history = model.fit(train_generator,
                    steps_per_epoch=len(train_generator), 
                    epochs=epochs,
                    validation_data=val_generator,
                    validation_steps=len(val_generator),
                    callbacks=[callback],)

字符串
上次epoch的输出:

Epoch 29/1000
232/232 [==============================] - 492s 2s/step - loss: 0.1298 - accuracy: 0.8940 - val_loss: 0.1220 - val_accuracy: 0.8937


开始微调的代码:

model.trainable = True

# Fine-tune from this layer onwards
fine_tune_at = -20

# Freeze all the layers before the `fine_tune_at` layer
for layer in model.layers[:fine_tune_at]:
  layer.trainable =  False

model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='binary_crossentropy',
              metrics=['accuracy'])

history_fine = model.fit(train_generator,
                         steps_per_epoch=len(train_generator), 
                         epochs=epochs,
                         validation_data=val_generator,
                         validation_steps=len(val_generator),
                         callbacks=[callback],)


但这是我在最初几个时代所看到的:

Epoch 1/1000
232/232 [==============================] - ETA: 0s - loss: 0.3459 - accuracy: 0.8409/usr/local/lib/python3.7/dist-packages/PIL/Image.py:960: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
  "Palette images with Transparency expressed in bytes should be "
232/232 [==============================] - 509s 2s/step - loss: 0.3459 - accuracy: 0.8409 - val_loss: 0.7755 - val_accuracy: 0.7262
Epoch 2/1000
232/232 [==============================] - 502s 2s/step - loss: 0.1889 - accuracy: 0.9066 - val_loss: 0.5628 - val_accuracy: 0.8881


最终损失下降并通过迁移学习损失:

Epoch 87/1000
232/232 [==============================] - 521s 2s/step - loss: 0.0232 - accuracy: 0.8312 - val_loss: 0.0481 - val_accuracy: 0.8563


为什么第一次微调的损失高于迁移学习的最后一次损失?

wko9yo5t

wko9yo5t1#

根据Tensorflow,Keras关于迁移学习和微调link的页面。Batch Norm层的参数应该单独保留。
重要的是,尽管基础模型变得可训练,但它仍然在推理模式下运行,因为我们在构建模型时调用它时传递了training=False。这意味着内部的批量规范化层不会更新它们的批量统计数据。如果它们这样做了,它们将破坏模型迄今为止学习的表示。
下面是我所做的,修复了解冻层后损失突然增加的问题:

from tensorflow.keras import layers
from tensorflow.keras.applications import MobileNet

img_width, img_height, num_channel = 128, 128, 3
conv_base = MobileNet(
             include_top=False,
             input_shape=(img_width, img_height, num_channel),
             pooling="avg")
conv_base.trainable = False

check_layer = layers.BatchNormalization() # a dummy layer

for layer in conv_base.layers[-50:]: # unfreeze 50 layers from the top
        # check if the layer is of type BatchNorm
        if type(layer) != type(check_layer): 
            layer.trainable = True

print(conv_base.summary(show_trainable=True)) # checking the layers' trainability

字符串

相关问题