keras CNN自动编码器的输出图像为白色

6yoyoihd  于 2022-12-29  发布在  其他
关注(0)|答案(2)|浏览(170)

我在训练自动编码CNN时遇到了麻烦。我的目标是以一种无人监督的方式对文档图像(收据、信件等)进行聚类(顺便问一下,除了自动编码器之外,你还有其他算法吗?)
所以我尝试做一个自动编码器,我总是得到奇怪的解码输出,我不知道是什么问题。我从一个非常简单的模型开始,没有太多的压缩:

Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_62 (Conv2D)           (None, 100, 76, 16)       448       
_________________________________________________________________
activation_62 (Activation)   (None, 100, 76, 16)       0         
_________________________________________________________________
conv2d_63 (Conv2D)           (None, 50, 38, 32)        4640      
_________________________________________________________________
activation_63 (Activation)   (None, 50, 38, 32)        0         
_________________________________________________________________
conv2d_64 (Conv2D)           (None, 50, 38, 32)        9248      
_________________________________________________________________
activation_64 (Activation)   (None, 50, 38, 32)        0         
_________________________________________________________________
up_sampling2d_26 (UpSampling (None, 100, 76, 32)       0         
_________________________________________________________________
conv2d_65 (Conv2D)           (None, 100, 76, 16)       4624      
_________________________________________________________________
activation_65 (Activation)   (None, 100, 76, 16)       0         
_________________________________________________________________
up_sampling2d_27 (UpSampling (None, 200, 152, 16)      0         
_________________________________________________________________
conv2d_66 (Conv2D)           (None, 200, 152, 3)       435       
_________________________________________________________________
activation_66 (Activation)   (None, 200, 152, 3)       0         
=================================================================
Total params: 19,395
Trainable params: 19,395
Non-trainable params: 0

我用少量的输入(~200)进行训练,这样训练就快了,我可以更快地调试。
在20个时期和32个批量之后,模型似乎收敛:

Epoch 1/20
4/4 [==============================] - 5s 1s/step - loss: 0.4359
Epoch 2/20
4/4 [==============================] - 5s 1s/step - loss: 0.4290
Epoch 3/20
4/4 [==============================] - 4s 904ms/step - loss: 0.4192
Epoch 4/20
4/4 [==============================] - 5s 1s/step - loss: 0.4045
Epoch 5/20
4/4 [==============================] - 3s 783ms/step - loss: 0.3886
Epoch 6/20
4/4 [==============================] - 3s 797ms/step - loss: 0.3706
Epoch 7/20
4/4 [==============================] - 5s 1s/step - loss: 0.3393
Epoch 8/20
4/4 [==============================] - 3s 777ms/step - loss: 0.3165
Epoch 9/20
4/4 [==============================] - 3s 850ms/step - loss: 0.2786
Epoch 10/20
4/4 [==============================] - 3s 780ms/step - loss: 0.2436
Epoch 11/20
4/4 [==============================] - 3s 817ms/step - loss: 0.2036
Epoch 12/20
4/4 [==============================] - 3s 771ms/step - loss: 0.1745
Epoch 13/20
4/4 [==============================] - 5s 1s/step - loss: 0.1347
Epoch 14/20
4/4 [==============================] - 3s 820ms/step - loss: 0.1150
Epoch 15/20
4/4 [==============================] - 5s 1s/step - loss: 0.1017
Epoch 16/20
4/4 [==============================] - 3s 792ms/step - loss: 0.0886
Epoch 17/20
4/4 [==============================] - 3s 789ms/step - loss: 0.0868
Epoch 18/20
4/4 [==============================] - 3s 842ms/step - loss: 0.0844
Epoch 19/20
4/4 [==============================] - 3s 762ms/step - loss: 0.0797
Epoch 20/20
4/4 [==============================] - 3s 779ms/step - loss: 0.0768

但输出图像看起来都像这样:
output of autoencoder (example)
对于损失,我使用了平均绝对误差和SGD优化器(其他优化器收敛得不太好)。
我试着增加历元的数量,但损失停滞在0.07左右,没有下降。
我做错了什么?有什么改进的想法吗?先谢了。
编辑:代码如下

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(rescale=1./255, zca_whitening=False, rotation_range=0.2, width_shift_range=0.005, height_shift_range=0.005, zoom_range=0.005)
train_generator = datagen.flow_from_directory('fp_img',class_mode='input',target_size=image_dims, batch_size=batch_size,shuffle=True)

import tensorflow.keras
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Activation, Flatten, Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Reshape
import matplotlib.pyplot as plt
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator

input_shape = image_rgb_dims

# Define the model
model = Sequential()

model.add(Conv2D(16, (3, 3), strides=2, padding='same', input_shape=image_rgb_dims))
model.add(Activation('relu'))

model.add(Conv2D(32, (3, 3), strides=2, padding='same'))
model.add(Activation('relu'))

model.add(Conv2D(32,(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

model.add(Conv2D(16,(3, 3), padding='same'))
model.add(Activation('relu'))
model.add(UpSampling2D((2, 2)))

model.add(Conv2D(3,(3, 3), padding='same'))
model.add(Activation('sigmoid'))

model.summary()

# Compile the model
model.compile(optimizer='adagrad', loss='mean_absolute_error')

# Train the model
model.fit(
        train_generator,
        steps_per_epoch= n_images // batch_size,
        epochs=20)
v6ylcynt

v6ylcynt1#

几点建议:

  • 您正在尝试创建自动编码器来对图像进行聚类,但您的嵌入不是一维的。您将如何对图像进行聚类?您需要密集图层或更多的转换图层才能转到一维矢量,以便能够在欧氏距离上进行聚类。
  • 当你只在一张没有数据增强的图像上训练时会发生什么?你的网络是过拟合生成了这张图像,还是仍然得到了白色图像?如果是前者,网络确实过拟合只生成了白色像素。一个解决方案是在损失函数中惩罚“不生成黑色像素”。如果是后者,你仍然有一个bug。也许你没有正确缩放图像。
  • 200张图像并不是很多图像,我不会尝试在自动编码器的这么小的数据集上验证您的模型的性能。
  • 您的所有图像都有文本吗?您可以尝试使用OCR对图像进行预处理,并根据出现的单词而不是图像上的单词进行聚类。特征甚至可以只是单词的数量或数字的数量...
  • 如果你坚持CNN,我会尝试调整你的图像,如果它的集群小得多,现在有相当大的网络,你想从头训练。
w6mmgewl

w6mmgewl2#

您尝试显示的图像是否是uint 8值在0-255范围内的所有矩阵?
我以前在尝试显示图像时看到过白色,这是因为它们在正确的值范围内,但它们仍然是float 32。

相关问题