tensorflow 在python中使用MapDataset在train/validation中分割数据集

yjghlzjz  于 2023-01-31  发布在  Python
关注(0)|答案(1)|浏览(146)

大家好,我面临的问题后,我精心制作的图像和标签。创建一个独特的数据集,我使用zip函数。经过精心制作的图像和标签都是18k,这是正确的,但当我调用zip(图像,标签),项目成为563。这里有一些代码,让你理解:

# Map the load_and_preprocess_image function over the dataset of image paths
images = image_paths.map(load_and_preprocess_image)
# Map the extract_label function over the dataset of image paths
labels = image_paths.map(extract_label)    
# Zip the labels and images together to create a dataset of (image, label) pairs
#HERE SOMETHING STRANGE HAPPENS
data = tf.data.Dataset.zip((images,labels))
# Shuffle and batch the data
data = data.shuffle(buffer_size=1000).batch(32)
# Split the data into train and test sets
data = data.shuffle(buffer_size=len(data))
# Convert the dataset into a collection of data
num_train = int(0.8 * len(data))
train_data = image_paths.take(num_train)
val_data = image_paths.skip(num_train)

我看不出哪里出错了,你能帮我一下吗?谢谢
我希望有一个包含18k图像、标签

2ic8powd

2ic8powd1#

  1. tf的压缩
    tf.data.Dataset.zip不像Python的ziptf.data.Dataset.zip的输入是tf datasets。您可以检查从map函数返回的图像/标签是否是正确的tf.Dataset对象。
    1.检查tf.ds
    确保你的图像/标签是正确的tf. ds.
print("ele: ", images_dataset.element_spec)
print("num: ", images_dataset.cardinality().numpy())
print("ele: ", labels_dataset.element_spec)
print("num: ", labels_dataset.cardinality().numpy())

1.变通方法
在您的情况下,将图像和标签处理合并到一个map函数中,然后将两者都返回到bypass以使用tf.data.Dataset.zip

# load_and_preprocess_image_and_label
def load_and_preprocess_image_and_label(image_path):
    """ load image and label then some operations """
    return image, label

# Map the load_and_preprocess_image function over the dataset of image/label paths
train_list = tf.data.Dataset.list_files(str(PATH / 'train/*.jpg'))
data = train_list.map(load_and_preprocess_image_and_label,
                                  num_parallel_calls=tf.data.AUTOTUNE)

相关问题