keras GPU内存不足,如何使用GridSearchCV在每个超参数组合处调用垃圾收集器来清理GPU内存?

lc8prwob  于 2023-10-19  发布在  其他
关注(0)|答案(1)|浏览(125)

我正在使用GridSearchCV API在远程服务器上训练我的模型,以调整一些超参数,如epochsl_ratebatch_sizepatience。不幸的是,在调优它们时,经过几次迭代后,我得到了以下错误:

Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0
to /job:localhost/replica:0/task:0/device:GPU:0
in order to run _EagerConst: Dst tensor is not initialized.

似乎the GPU memory of the server is not enough和这个错误时,GPU内存已满,他们建议减少数据集大小和/或batch_size提出。
首先,我将batch_size减少到24816,但错误仍然存在,因为我得到:

W tensorflow/tsl/framework/bfc_allocator.cc:485] Allocator (GPU_0_bfc) ran 
out of memory trying to allocate 1.17GiB (rounded to 1258291200) requested 
by op _EagerConst
If the cause is memory fragmentation maybe the environment variable
'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation

然后,我按照建议设置了os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async',但问题仍然存在。
尽管如此,如果我减少数据集的大小,这个问题似乎得到了解决,但是 * 我必须使用整个数据集 *(我不能浪费数据)。
为了解决这个问题,我的主要想法是:
1.防止重新创建用于损失和培训管理的新模型和相关对象。这将是最佳的解决方案,因为它将始终使用相同的模型(显然确保它被每个新的超参数组合“重置”),具有相对损失和训练。这个解决方案可能是最复杂的,因为我不知道我选择使用的库是否允许它。
1.验证相同的问题不是由数据而不是模型引起的(即我不希望为每个超参数组合重新分配相同的数据,而将旧的数据留在内存中)。这也可能是一个原因,我认为解决办法比前一个或类似的办法更简单,但我认为它作为一个原因的可能性较小。在任何情况下,检查这不会发生。
1.通过调用垃圾收集器在每个超参数组合处重置内存(我不知道它是否也适用于GPU)。这是最简单的解决方案,也可能是我会尝试的第一件事,但它不一定有效,因为如果它使用的库维护对内存中对象的引用(即使它们不再使用),这些引用不会被垃圾收集器消除。
还有with the tensorflow backend the current model is not destroyed,所以我需要清除会话。
如果您有任何其他想法或想法,请随时与我分享。这些是涉及的功能:

def grid_search_vae(x_train, latent_dimension):
    param_grid = {
        'epochs': [2500],
        'l_rate': [10 ** -4, 10 ** -5, 10 ** -6, 10 ** -7],
        'batch_size': [32, 64],  # [2, 4, 8, 16] won't fix the issue
        'patience': [30]
    }

    ssim_scorer = make_scorer(my_ssim, greater_is_better=True)

    grid = GridSearchCV(
        VAEWrapper(encoder=Encoder(latent_dimension), decoder=Decoder()),
        param_grid, scoring=ssim_scorer, cv=5, refit=False
    )

    grid.fit(x_train, x_train)
    return grid

def refit(fitted_grid, x_train, y_train, latent_dimension):    
    best_epochs = fitted_grid.best_params_["epochs"]
    best_l_rate = fitted_grid.best_params_["l_rate"]
    best_batch_size = fitted_grid.best_params_["batch_size"]
    best_patience = fitted_grid.best_params_["patience"]

    x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)

    encoder = Encoder(latent_dimension)
    decoder = Decoder()
    vae = VAE(encoder, decoder, best_epochs, best_l_rate, best_batch_size)
    vae.compile(Adam(best_l_rate))

    early_stopping = EarlyStopping("val_loss", patience=best_patience)
    history = vae.fit(x_train, x_train, best_batch_size, best_epochs,
                      validation_data=(x_val, x_val), callbacks=[early_stopping])
    return history, vae

下面是main代码:

if __name__ == '__main__':
    x_train, x_test, y_train, y_test = load_data("data", "labels")

    # Reducing data set size will fix the issue 
    # new_size = 200
    # x_train, y_train = reduce_size(x_train, y_train, new_size)
    # x_test, y_test = reduce_size(x_test, y_test, new_size)

    latent_dimension = 25 
    grid = grid_search_vae(x_train, latent_dimension)
    history, vae = refit(grid, x_train, y_train, latent_dimension)

你能帮帮我吗?
如果你需要这些信息,这些是GPU:

2023-09-18 11:21:25.628286: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7347 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1
2023-09-18 11:21:25.629120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 7371 MB memory:  -> device: 1, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:03:00.0, compute capability: 6.1
2023-09-18 11:21:31.911969: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600

我使用tensorflow作为keras后端,即:

from keras import backend as K
K.backend()  # 'tensorflow'

我还试图补充:

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

main代码(作为第一个指令),但这没有帮助。
如果你需要模型的代码,这里是:

import numpy as np
import tensorflow as tf
from keras.initializers import he_uniform
from keras.layers import Conv2DTranspose, BatchNormalization, Reshape, Dense, Conv2D, Flatten
from keras.optimizers.legacy import Adam
from keras.src.callbacks import EarlyStopping
from skimage.metrics import structural_similarity as ssim
from sklearn.base import BaseEstimator
from sklearn.metrics import mean_absolute_error, make_scorer
from sklearn.model_selection import train_test_split, GridSearchCV
from tensorflow import keras

class VAEWrapper:
    def __init__(self, **kwargs):
        self.vae = VAE(**kwargs)
        self.vae.compile(Adam())

    def fit(self, x, y, **kwargs):
        self.vae.fit(x, y, **kwargs)

    def get_config(self):
        return self.vae.get_config()

    def get_params(self, deep):
        return self.vae.get_params(deep)

    def set_params(self, **params):
        return self.vae.set_params(**params)

class VAE(keras.Model, BaseEstimator):
    def __init__(self, encoder, decoder, epochs=None, l_rate=None, batch_size=None, patience=None, **kwargs):
        super().__init__(**kwargs)
        self.encoder = encoder
        self.decoder = decoder
        self.epochs = epochs  # For grid search
        self.l_rate = l_rate  # For grid search
        self.batch_size = batch_size  # For grid search
        self.patience = patience  # For grid search
        self.total_loss_tracker = keras.metrics.Mean(name="total_loss")
        self.reconstruction_loss_tracker = keras.metrics.Mean(name="reconstruction_loss")
        self.kl_loss_tracker = keras.metrics.Mean(name="kl_loss")

    def call(self, inputs, training=None, mask=None):
        _, _, z = self.encoder(inputs)
        outputs = self.decoder(z)
        return outputs

    @property
    def metrics(self):
        return [
            self.total_loss_tracker,
            self.reconstruction_loss_tracker,
            self.kl_loss_tracker,
        ]

    def train_step(self, data):
        data, labels = data
        with tf.GradientTape() as tape:
            # Forward pass
            z_mean, z_log_var, z = self.encoder(data)
            reconstruction = self.decoder(z)

            # Compute losses
            reconstruction_loss = tf.reduce_mean(
                tf.reduce_sum(
                    keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)
                )
            )
            kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
            kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
            total_loss = reconstruction_loss + kl_loss

        # Compute gradient
        grads = tape.gradient(total_loss, self.trainable_weights)

        # Update weights
        self.optimizer.apply_gradients(zip(grads, self.trainable_weights))

        # Update metrics
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)

        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

    def test_step(self, data):
        data, labels = data
        # Forward pass
        z_mean, z_log_var, z = self.encoder(data)
        reconstruction = self.decoder(z)

        # Compute losses
        reconstruction_loss = tf.reduce_mean(
            tf.reduce_sum(
                keras.losses.binary_crossentropy(data, reconstruction), axis=(1, 2)
            )
        )
        kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))
        kl_loss = tf.reduce_mean(tf.reduce_sum(kl_loss, axis=1))
        total_loss = reconstruction_loss + kl_loss

        # Update metrics
        self.total_loss_tracker.update_state(total_loss)
        self.reconstruction_loss_tracker.update_state(reconstruction_loss)
        self.kl_loss_tracker.update_state(kl_loss)

        return {
            "loss": self.total_loss_tracker.result(),
            "reconstruction_loss": self.reconstruction_loss_tracker.result(),
            "kl_loss": self.kl_loss_tracker.result(),
        }

@keras.saving.register_keras_serializable()
class Encoder(keras.layers.Layer):
    def __init__(self, latent_dimension):
        super(Encoder, self).__init__()
        self.latent_dim = latent_dimension

        seed = 42

        self.conv1 = Conv2D(filters=64, kernel_size=3, activation="relu", strides=2, padding="same",
                            kernel_initializer=he_uniform(seed))
        self.bn1 = BatchNormalization()

        self.conv2 = Conv2D(filters=128, kernel_size=3, activation="relu", strides=2, padding="same",
                            kernel_initializer=he_uniform(seed))
        self.bn2 = BatchNormalization()

        self.conv3 = Conv2D(filters=256, kernel_size=3, activation="relu", strides=2, padding="same",
                            kernel_initializer=he_uniform(seed))
        self.bn3 = BatchNormalization()

        self.flatten = Flatten()
        self.dense = Dense(units=100, activation="relu")

        self.z_mean = Dense(latent_dimension, name="z_mean")
        self.z_log_var = Dense(latent_dimension, name="z_log_var")

        self.sampling = sample

    def call(self, inputs, training=None, mask=None):
        x = self.conv1(inputs)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.flatten(x)
        x = self.dense(x)
        z_mean = self.z_mean(x)
        z_log_var = self.z_log_var(x)
        z = self.sampling(z_mean, z_log_var)
        return z_mean, z_log_var, z

@keras.saving.register_keras_serializable()
class Decoder(keras.layers.Layer):
    def __init__(self):
        super(Decoder, self).__init__()
        self.dense1 = Dense(units=4096, activation="relu")
        self.bn1 = BatchNormalization()

        self.dense2 = Dense(units=1024, activation="relu")
        self.bn2 = BatchNormalization()

        self.dense3 = Dense(units=4096, activation="relu")
        self.bn3 = BatchNormalization()

        seed = 42

        self.reshape = Reshape((4, 4, 256))
        self.deconv1 = Conv2DTranspose(filters=256, kernel_size=3, activation="relu", strides=2, padding="same",
                                       kernel_initializer=he_uniform(seed))
        self.bn4 = BatchNormalization()

        self.deconv2 = Conv2DTranspose(filters=128, kernel_size=3, activation="relu", strides=1, padding="same",
                                       kernel_initializer=he_uniform(seed))
        self.bn5 = BatchNormalization()

        self.deconv3 = Conv2DTranspose(filters=128, kernel_size=3, activation="relu", strides=2, padding="valid",
                                       kernel_initializer=he_uniform(seed))
        self.bn6 = BatchNormalization()

        self.deconv4 = Conv2DTranspose(filters=64, kernel_size=3, activation="relu", strides=1, padding="valid",
                                       kernel_initializer=he_uniform(seed))
        self.bn7 = BatchNormalization()

        self.deconv5 = Conv2DTranspose(filters=64, kernel_size=3, activation="relu", strides=2, padding="valid",
                                       kernel_initializer=he_uniform(seed))
        self.bn8 = BatchNormalization()

        self.deconv6 = Conv2DTranspose(filters=1, kernel_size=2, activation="sigmoid", padding="valid",
                                       kernel_initializer=he_uniform(seed))

    def call(self, inputs, training=None, mask=None):
        x = self.dense1(inputs)
        x = self.bn1(x)
        x = self.dense2(x)
        x = self.bn2(x)
        x = self.dense3(x)
        x = self.bn3(x)
        x = self.reshape(x)
        x = self.deconv1(x)
        x = self.bn4(x)
        x = self.deconv2(x)
        x = self.bn5(x)
        x = self.deconv3(x)
        x = self.bn6(x)
        x = self.deconv4(x)
        x = self.bn7(x)
        x = self.deconv5(x)
        x = self.bn8(x)
        decoder_outputs = self.deconv6(x)
        return decoder_outputs
c86crjj0

c86crjj01#

要解决你的记忆问题,试试:
清除GPU内存:TensorFlow可以与GPU内存紧密结合。每次迭代后,像这样清除它:

from keras import backend as K
import gc
# After each iteration:
K.clear_session()
gc.collect()

这也是有帮助的,因为在网格搜索中,每个超参数组合都可能创建模型的一个新示例。
使用混合精度:这可以保存内存并加快速度。在代码的顶部设置一个策略:

from tensorflow.keras.mixed_precision import experimental as mixed_precision
mixed_precision.set_policy(mixed_precision.Policy('mixed_float16'))

在模型的最后一个激活层中,使用dtype='float32'来保持稳定性。例如

tf.keras.layers.Activation('softmax', dtype='float32')

你也可以尝试流水线数据集。这使得数据加载更加高效。

### Convert to TF dataset
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
### Fitting the model
vae.fit(train_dataset, epochs=best_epochs, validation_data=val_dataset, callbacks=[early_stopping])

相关问题