我正在尝试用tf. keras训练dl模型。我在图像目录中有67类图像，如机场、书店、赌场。对于每个类，我至少有100张图像。数据来自mit indoor scene数据集，但当我尝试训练模型时，我经常得到这个错误。

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Input size should match (header_size + row_size * abs_height) but they differ by 2
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]]
  (1) Invalid argument:  Input size should match (header_size + row_size * abs_height) but they differ by 2
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]]
         [[IteratorGetNext/_7]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_1570]

Function call stack:
train_function -> train_function

我试图通过调整大小层调整图像大小来解决这个问题，还在image_dataset_from_directory方法中包含了labels='inferred'和label_mode='categorical'，并在模型编译方法中包含了loss='categorical_crossentropy'。以前没有设置标签和标签模型，损失是稀疏分类交叉熵，我认为这是不正确的。所以我如上所述进行了更改。但我仍然有问题。
在stackoverflow中有一个与此相关的问题，但该人员没有提到他是如何解决该问题的，只是更新了该问题-我的建议是检查数据集的元数据。它有助于修复我的问题。但没有提到要查找什么元数据或他是如何解决该问题的。
我用来训练模型的代码-

import os
import PIL
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Dense, MaxPooling2D, GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten, Dropout, BatchNormalization, Rescaling
from tensorflow.keras.models import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.regularizers import l1, l2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from pathlib import Path
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# define directory paths
PROJECT_PATH = Path.cwd()
DATA_PATH = PROJECT_PATH.joinpath('data', 'Images')

# create a dataset
batch_size = 32
img_height = 180
img_width = 180

train = tf.keras.utils.image_dataset_from_directory(
    DATA_PATH,
    validation_split=0.2,
    subset="training",
    labels="inferred",
    label_mode="categorical",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

valid = tf.keras.utils.image_dataset_from_directory(
    DATA_PATH,
    validation_split=0.2,
    subset="validation",
    labels="inferred",
    label_mode="categorical",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

class_names = train.class_names

for image_batch, label_batch in train.take(1):
    print("\nImage shape:", image_batch.shape)
    print("Label Shape", label_batch.shape)

# resize image
resize_layer = tf.keras.layers.Resizing(img_height, img_width)
train = train.map(lambda x, y: (resize_layer(x), y))
valid = valid.map(lambda x, y: (resize_layer(x), y))

# standardize the data
normalization_layer = tf.keras.layers.Rescaling(1./255)
train = train.map(lambda x, y: (normalization_layer(x), y))
valid = valid.map(lambda x, y: (normalization_layer(x), y))

image_batch, labels_batch = next(iter(train))
first_image = image_batch[0]
print("\nImage (min, max) value:", (np.min(first_image), np.max(first_image)))
print()

# configure the dataset for performance
AUTOTUNE = tf.data.AUTOTUNE

train = train.cache().prefetch(buffer_size=AUTOTUNE)
valid = valid.cache().prefetch(buffer_size=AUTOTUNE)

# create a basic model architecture

num_classes = len(class_names)

# initiate a sequential model
model = Sequential()

# CONV1
model.add(Conv2D(filters=64, kernel_size=3, activation="relu",
          input_shape=(img_height, img_width, 3)))
model.add(BatchNormalization())

# CONV2
model.add(Conv2D(filters=64, kernel_size=3,
          activation="relu"))
model.add(BatchNormalization())

# Pool + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))

# CONV3
model.add(Conv2D(filters=128, kernel_size=3,
          activation="relu"))
model.add(BatchNormalization())

# CONV4
model.add(Conv2D(filters=128, kernel_size=3,
          activation="relu"))
model.add(BatchNormalization())

# POOL + Dropout
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.3))

# FC5
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dense(num_classes, activation="softmax"))

# compile the model

model.compile(loss="categorical_crossentropy",
              optimizer="adam", metrics=['accuracy'])

# train the model
epochs = 25
early_stopping_cb = EarlyStopping(patience=10, restore_best_weights=True)

history = model.fit(train, validation_data=valid, epochs=epochs,
                    callbacks=[early_stopping_cb], verbose=2)

result = pd.DataFrame(history.history)
print()
print(result.head())

**注-**我只是修改了代码，使其尽可能简单，以减少错误。模型运行了几批，比再次得到上述错误。

Epoch 1/10
732/781 [===========================>..] - ETA: 22s - loss: 3.7882Traceback (most recent call last):
  File ".\02_model1.py", line 139, in <module>
    model.fit(train, epochs=10, validation_data=valid)
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\keras\engine\training.py", line 1184, in fit
    tmp_logs = self.train_function(iterator)
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in __call__
    return graph_function._call_flat(
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
    outputs = execute.execute(
  File "C:\Users\BHOLA\anaconda3\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Input size should match (header_size + row_size * abs_height) but they differ by 2
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]]
  (1) Invalid argument:  Input size should match (header_size + row_size * abs_height) but they differ by 2
         [[{{node decode_image/DecodeImage}}]]
         [[IteratorGetNext]]
         [[IteratorGetNext/_2]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_11840]

Function call stack:
train_function -> train_function

修改代码-

# create a dataset
batch_size = 16
img_height = 256
img_width = 256

train = image_dataset_from_directory(
    DATA_PATH,
    validation_split=0.2,
    subset="training",
    labels="inferred",
    label_mode="categorical",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

valid = image_dataset_from_directory(
    DATA_PATH,
    validation_split=0.2,
    subset="validation",
    labels="inferred",
    label_mode="categorical",
    seed=123,
    image_size=(img_height, img_width),
    batch_size=batch_size
)

model = tf.keras.applications.Xception(
    weights=None, input_shape=(img_height, img_width, 3), classes=67)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
model.fit(train, epochs=10, validation_data=valid)

这实际上是一个文件损坏的问题。然而，潜在的问题要微妙得多。下面解释了发生了什么以及如何绕过这个障碍。我在同一个MIT Indoor Scene Classification数据集上遇到了同样的问题。所有的图像都是JPEG文件（* 剧透警告：那么，他们是吗？*）。
我们已经正确地注意到，异常正是在这里引发的，在一个与tf.io.decode_image()函数相关的C文件中。问题出在decode_image()函数中，该函数由tf.keras.utils.image_dataset_from_directory()调用。
另一方面，tf.keras.preprocessing.image.ImageDataGenerator().flow_from_directory()在幕后依赖于Pillow（如此处所示，从此处调用），这就是采用ImageDataGenerator类的原因。
仔细检查相应的C源文件后，可以观察到该函数实际上被称为DecodeBmpV2(...)，如此处所定义。这就提出了为什么JPEG图像被视为BMP图像的问题。上述函数实际上在此处被调用，作为基本switch语句的一部分，其目的是根据确定的类型进一步直接进行数据转换。因此，确定文件类型的代码应该进行更深入的分析。文件类型是根据起始字节的值确定的（见此处）。长话短说，对表示文件类型的所谓magic bytes进行了简单的比较。
下面是一个带有相应魔术字节的代码摘录。

static const char kPngMagicBytes[] = "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A";
static const char kGifMagicBytes[] = "\x47\x49\x46\x38";
static const char kBmpMagicBytes[] = "\x42\x4d";
static const char kJpegMagicBytes[] = "\xff\xd8\xff";

在识别出引发异常的文件后，我发现它们应该是JPEG文件，但是它们的起始字节却指示了BMP格式。
下面是3个文件及其前10个字节的示例。

laundromat\laundry_room_area.jpg
    b'ffd8ffe000104a464946'

laundromat\Laundry_Room_Edens1A.jpg
    b'ffd8ffe000104a464946'

laundromat\Laundry_Room_bmp.jpg
    b'424d3800030000000000'

看看最后一个，它的文件名中竟然有 bmp 一词，这是为什么呢？我不知道。数据集确实包含损坏的图像文件。可能有人将文件从BMP转换为JPEG，但使用的工具无法正常工作。我们只能猜测真实的的原因，但现在这无关紧要。
确定文件类型的方法与Pillow包执行的方法不同，因此，我们对此无能为力。建议识别损坏的文件，这实际上很容易，或者可以依赖ImageDataGenerator。但是，我建议不要这样做，因为该类已被标记为弃用。这不是代码本身的错误，而是无意中引入到数据集中的坏数据。

2条答案

按热度按时间

wlsrxk511#

赞(0）回复(0）举报 2023-01-13

yizd12fk2#

我认为这可能是一个损坏的文件。它在DecodeBMPv2函数（https：//github.com/tensorflow/tensorflow/blob/0 b6 b491 d21 d 6a 4 eb 5 fbab 1cca 565 bc 1 e94 ca 9543/tensorflow/core/kernels/image/decode_image_op. cc #L594）中进行数据完整性检查后抛出异常。
如果这就是问题所在，并且您想找出哪些文件引发了异常，您可以在包含这些文件的目录上尝试以下操作。删除/替换您找到的任何文件，它应该会正常训练。

import glob

img_paths = glob.glob(os.path.join(<path_to_dataset>,'*/*.*') # assuming you point to the directory containing the label folders.

bad_paths = []

for image_path in img_paths:
    try:
      img_bytes = tf.io.read_file(path)
      decoded_img = tf.io.decode_image(img_bytes)
    except tf.errors.InvalidArgumentError as e:
      print(f"Found bad path {image_path}...{e}")
      bad_paths.append(image_path)

    print(f"{image_path}: OK")

print("BAD PATHS:")
for bad_path in bad_paths:
    print(f"{bad_path}")

keras tensorflow -无效参数：输入大小应匹配，但相差2

2条答案

相关问题

热门标签

最新问答