python 无法识别的图像错误:无法识别图像文件

jhiyze9q  于 2023-02-07  发布在  Python
关注(0)|答案(5)|浏览(679)

大家好,我正在使用TensorFlow和Keras训练模型,数据集是从https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765下载的
这是一个zip文件夹,我将其拆分到以下目录中:

.
├── test
│   ├── Cat
│   └── Dog
└── train
    ├── Cat
    └── Dog

Test.cat 和 test.dog 每个文件夹具有1000张jpg照片,并且train.cat和traing.dog每个文件夹具有11500张jpg照片。
负载正在使用以下代码:

batch_size = 16

# Data augmentation and preprocess
train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.20) # set validation split

# Train dataset
train_generator = train_datagen.flow_from_directory(
    'PetImages/train',
    target_size=(244, 244),
    batch_size=batch_size,
    class_mode='binary',
    subset='training') # set as training data

# Validation dataset
validation_generator = train_datagen.flow_from_directory(
    'PetImages/train',
    target_size=(244, 244),
    batch_size=batch_size,
    class_mode='binary',
    subset='validation') # set as validation data

test_datagen = ImageDataGenerator(rescale=1./255)
# Test dataset
test_datagen = test_datagen.flow_from_directory(
    'PetImages/test')

模型正在使用以下代码进行训练:

history = model.fit(train_generator,
                    validation_data=validation_generator,
                    epochs=5)

我得到了以下输入:

Epoch 1/5
1150/1150 [==============================] - ETA: 0s - loss: 0.0505 - accuracy: 0.9906

但当历元在这一点时,我得到以下错误:
无法识别的图像错误:无法识别映像文件0x7f9e185347d0处的IO对象
我怎样才能解决这个问题,以便完成培训?
谢谢

jljoyd4f

jljoyd4f1#

尝试使用此功能检查图像格式是否正确。

import os
from PIL import Image
folder_path = 'data\img'
extensions = []
for fldr in os.listdir(folder_path):
    sub_folder_path = os.path.join(folder_path, fldr)
    for filee in os.listdir(sub_folder_path):
        file_path = os.path.join(sub_folder_path, filee)
        print('** Path: {}  **'.format(file_path), end="\r", flush=True)
        im = Image.open(file_path)
        rgb_im = im.convert('RGB')
        if filee.split('.')[1] not in extensions:
            extensions.append(filee.split('.')[1])
vpfxa7rd

vpfxa7rd2#

我不知道这是否仍然相关,但对于将来会遇到同样问题的人来说:
在此特定情况下,dog_cat数据集中有两个损坏的文件:

  • cats/666.jpg
  • dogs/11702.jpg

只要把它们去掉,它就会起作用。

wnavrhmk

wnavrhmk3#

我以前遇到过这个问题。所以我开发了一个python脚本来测试training和test目录中是否有有效的图像文件。文件扩展名必须是jpg,png,bmp或gif文件,所以它首先检查正确的扩展名。然后它尝试使用cv2读入图像。如果它没有输入有效的图像,则会创建异常。在每种情况下,错误的文件名都会被打印出来。在结束时,名为bad_list的列表包含错误文件路径列表。注意,目录必须命名为“test”和“train”

import os
import cv2
bad_list=[]
dir=r'c:\'PetImages'
subdir_list=os.listdir(dir) # create a list of the sub directories in the directory ie train or test
for d in subdir_list:  # iterate through the sub directories train and test
    dpath=os.path.join (dir, d) # create path to sub directory
    if d in ['test', 'train']:
        class_list=os.listdir(dpath) # list of classes ie dog or cat
       # print (class_list)
        for klass in class_list: # iterate through the two classes
            class_path=os.path.join(dpath, klass) # path to class directory
            #print(class_path)
            file_list=os.listdir(class_path) # create list of files in class directory
            for f in file_list: # iterate through the files
                fpath=os.path.join (class_path,f)
                index=f.rfind('.') # find index of period infilename
                ext=f[index+1:] # get the files extension
                if ext  not in ['jpg', 'png', 'bmp', 'gif']:
                    print(f'file {fpath}  has an invalid extension {ext}')
                    bad_list.append(fpath)                    
                else:
                    try:
                        img=cv2.imread(fpath)
                        size=img.shape
                    except:
                        print(f'file {fpath} is not a valid image file ')
                        bad_list.append(fpath)
                       
print (bad_list)
8wigbo56

8wigbo564#

您的图像可能已损坏。在数据预处理步骤中,尝试使用Image.open()查看是否可以打开所有图像。

svmlkihl

svmlkihl5#

而不是附加损坏的列表,我们可以只删除在每个示例的错误太...

import os
from PIL import Image
folder_path = r"C:\Users\ImageDatasets"
extensions = []
corupt_img_paths=[]
for fldr in os.listdir(folder_path):
    sub_folder_path = os.path.join(folder_path, fldr)
    for filee in os.listdir(sub_folder_path):
        file_path = os.path.join(sub_folder_path, filee)
        print('** Path: {}  **'.format(file_path), end="\r", flush=True)
        try:
            im = Image.open(file_path)
        except:
            print(file_path)
            os.remove(file_path)
            continue
        else:
            rgb_im = im.convert('RGB')
            if filee.split('.')[1] not in extensions:
                extensions.append(filee.split('.')[1])

相关问题