Python：如何向Keras模型提供大数据集？[duplicate]

lymgl2op 于 2023-01-26 发布在 Python

关注(0)|答案(1)|浏览(122)

- 此问题在此处已有答案**：

Keras - data generator for datasets too large to fit into memory（1个答案）
17小时前关门了。
基本上我有一个训练数据集，其中包含数十万张带有标签的图像，可以用来训练ML模型，但是（正如预期的那样）我不能简单地创建一个numpy数组来保存图像，如下所示：
所有图像= np个零（形状=（500000，256，256，3），数据类型="uint8"）
我不认为大公司只是有"巨大的"内存来使用巨大的数据集进行培训。
那么，如何使用整个数据集进行训练，而不必在调用www.example.com（）之前将整个数据集保存在内存中呢？model.fit()?
以下是整个加载函数（如果需要）：
（详情如下）

def load_images(images: list):
# Create empty np.ndarray to hold n images of size 256 x 256 with 3 channels (RGB)
resized_images = np.zeros(shape=(len(images), 256, 256, 3), dtype="uint8")

index = 0
for image in images:
    print(index)

    # Load image with cv2
    img = cv2.imread(images)

    # Resize image to 256 width, 256 height
    img = cv2.resize(img, dsize=(256, 256))

    # Add image to ndarray 'resized_images'
    resized_images[index] = img

    index += 1
return resized_images

这个函数的目的是调整训练图像的大小，并将它们加载到一个numpy数组中，以传递给www.example.com（）中的模型注意：我删除了一些np. transpose（）调用，以使代码更清晰，因此如果复制和粘贴，这可能无法工作model.fit() Note: I removed some np.transpose() calls to make the code more legible so this might not work if copied and pasted
到目前为止，我已经尝试过保存模型并加载它以继续训练，但没有成功（加载模型不会保留所有属性）。但如果这是最好的方法，请随时分享您的方法。

keras

来源：https://stackoverflow.com/questions/75228948/python-how-to-feed-large-dataset-to-keras-model

1条答案

按热度按时间

mnowg1ta1#

考虑使用发电机这样的好东西。
首先我建议你注意tf.keras.preprocessing.image.ImageDataGenerator类和它的方法flow_from_directory()。
如果你想以某种不寻常的方式预处理图像，我建议你考虑通过继承tf.keras.utils.Sequence类来创建自己的生成器，如下所示：
class CustomImageDataGen(tf.keras.utils.Sequence)
This article可能会有所帮助。

赞(0）回复(0）举报 2023-01-26

我来回答

Python：如何向Keras模型提供大数据集？[duplicate]

1条答案

相关问题

热门标签

最新问答