keras 保存和加载模型优化器状态

vawmfj5a  于 2023-02-12  发布在  其他
关注(0)|答案(7)|浏览(213)

我有一组相当复杂的模型,我正在训练,我正在寻找一种方法来保存和加载模型优化器状态。“训练器模型”由其他几个“权重模型”的不同组合组成,其中一些具有共享权重,一些具有取决于训练器的冻结权重,等等。这是一个有点太复杂的例子来分享,但简而言之,停止和开始训练时,我无法使用model.save('model_file.h5')keras.models.load_model('model_file.h5')
如果训练已经完成,使用model.load_weights('weight_file.h5')可以很好地测试我的模型,但是如果我尝试使用这个方法继续训练模型,损失甚至不会接近返回到它的上一个位置。我已经读到这是因为优化器状态没有使用这个方法保存,这是有意义的。但是,我需要一个方法来保存和加载我的训练器模型的优化器的状态。看起来好像keras曾经有一个model.optimizer.get_sate()model.optimizer.set_sate()来完成我所追求的,但现在似乎不再是这样了(至少对于Adam优化器来说是这样)。当前的Keras还有其他解决方案吗?

eiee3dmh

eiee3dmh1#

您可以从load_modelsave_model函数中提取重要的行。
对于保存优化器状态,在save_model中:

# Save optimizer weights.
symbolic_weights = getattr(model.optimizer, 'weights')
if symbolic_weights:
    optimizer_weights_group = f.create_group('optimizer_weights')
    weight_values = K.batch_get_value(symbolic_weights)

对于加载优化器状态,在load_model中:

# Set optimizer weights.
if 'optimizer_weights' in f:
    # Build train function (to get weight updates).
    if isinstance(model, Sequential):
        model.model._make_train_function()
    else:
        model._make_train_function()

    # ...

    try:
        model.optimizer.set_weights(optimizer_weight_values)

下面是一个示例,将上面的行组合在一起:
1.首先拟合5个时期的模型。

X, y = np.random.rand(100, 50), np.random.randint(2, size=100)
x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 4ms/step - loss: 0.7716
Epoch 2/5
100/100 [==============================] - 0s 64us/step - loss: 0.7678
Epoch 3/5
100/100 [==============================] - 0s 82us/step - loss: 0.7665
Epoch 4/5
100/100 [==============================] - 0s 56us/step - loss: 0.7647
Epoch 5/5
100/100 [==============================] - 0s 76us/step - loss: 0.7638

1.现在保存权重和优化器状态。

model.save_weights('weights.h5')
symbolic_weights = getattr(model.optimizer, 'weights')
weight_values = K.batch_get_value(symbolic_weights)
with open('optimizer.pkl', 'wb') as f:
    pickle.dump(weight_values, f)

1.在另一个python会话中重建模型,并加载权重。

x = Input((50,))
out = Dense(1, activation='sigmoid')(x)
model = Model(x, out)
model.compile(optimizer='adam', loss='binary_crossentropy')

model.load_weights('weights.h5')
model._make_train_function()
with open('optimizer.pkl', 'rb') as f:
    weight_values = pickle.load(f)
model.optimizer.set_weights(weight_values)

1.继续模型培训。

model.fit(X, y, epochs=5)

Epoch 1/5
100/100 [==============================] - 0s 674us/step - loss: 0.7629
Epoch 2/5
100/100 [==============================] - 0s 49us/step - loss: 0.7617
Epoch 3/5
100/100 [==============================] - 0s 49us/step - loss: 0.7611
Epoch 4/5
100/100 [==============================] - 0s 55us/step - loss: 0.7601
Epoch 5/5
100/100 [==============================] - 0s 49us/step - loss: 0.7594
7gyucuyw

7gyucuyw2#

对于那些不使用model.compile而是使用optimizer.apply_gradients执行自动微分来手动应用梯度的人,我想我有一个解决方案。
首先,保存优化器权重:np.save(path, optimizer.get_weights())
然后,当您准备好重新加载优化器时,向新示例化的优化器显示它将更新的权重大小,方法是在计算梯度的变量大小的Tensor列表上调用optimizer.apply_gradients。在设置优化器的权重之后设置模型的权重是非常重要的,因为动量-即使我们给予它的梯度为零,像Adam这样的基于优化器也将更新模型的权重。

import tensorflow as tf
import numpy as np

model = # instantiate model (functional or subclass of tf.keras.Model)

# Get saved weights
opt_weights = np.load('/path/to/saved/opt/weights.npy', allow_pickle=True)

grad_vars = model.trainable_weights
# This need not be model.trainable_weights; it must be a correctly-ordered list of 
# grad_vars corresponding to how you usually call the optimizer.

optimizer = tf.keras.optimizers.Adam(lrate)

zero_grads = [tf.zeros_like(w) for w in grad_vars]

# Apply gradients which don't do nothing with Adam
optimizer.apply_gradients(zip(zero_grads, grad_vars))

# Set the weights of the optimizer
optimizer.set_weights(opt_weights)

# NOW set the trainable weights of the model
model_weights = np.load('/path/to/saved/model/weights.npy', allow_pickle=True)
model.set_weights(model_weights)

注意,如果我们在第一次调用apply_gradients之前尝试设置权重,则会抛出一个错误,即优化器期望得到一个长度为零的权重列表。

ha5z0ras

ha5z0ras3#

完成Alex Trevithick的回答后,可以避免重新调用model.set_weights,只需在应用梯度之前保存变量的状态,然后重新加载。这在从h5文件加载模型时很有用,看起来更干净(imo)。
保存/加载功能如下(再次感谢Alex):

def save_optimizer_state(optimizer, save_path, save_name):
    '''
    Save keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object.
    save_path --- Path to save location.
    save_name --- Name of the .npy file to be created.

    '''

    # Create folder if it does not exists
    if not os.path.exists(save_path):
        os.makedirs(save_path)
    
    # save weights
    np.save(os.path.join(save_path, save_name), optimizer.get_weights())

    return

def load_optimizer_state(optimizer, load_path, load_name, model_train_vars):
    '''
    Loads keras.optimizers object state.

    Arguments:
    optimizer --- Optimizer object to be loaded.
    load_path --- Path to save location.
    load_name --- Name of the .npy file to be read.
    model_train_vars --- List of model variables (obtained using Model.trainable_variables)

    '''

    # Load optimizer weights
    opt_weights = np.load(os.path.join(load_path, load_name)+'.npy', allow_pickle=True)

    # dummy zero gradients
    zero_grads = [tf.zeros_like(w) for w in model_train_vars]
    # save current state of variables
    saved_vars = [tf.identity(w) for w in model_train_vars]

    # Apply gradients which don't do nothing with Adam
    optimizer.apply_gradients(zip(zero_grads, model_train_vars))

    # Reload variables
    [x.assign(y) for x,y in zip(model_train_vars, saved_vars)]

    # Set the weights of the optimizer
    optimizer.set_weights(opt_weights)

    return
ccrfmcuu

ccrfmcuu4#

将Keras升级到2.2.4并使用Pickle为我解决了这个问题。随着Keras发布2.2.3 Keras模型现在可以安全地进行Pickle。

zi8p0yeb

zi8p0yeb5#

任何尝试在分布式设置中使用@Yu-Yang的solution的人都可能会运行以下错误:

ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.distribute_lib._DefaultDistributionStrategy object at 0x7fdf357726d8>), which is different from the scope used for the original variable (MirroredVariable:{
  0: <tf.Variable 'conv2d_1/kernel:0' shape=(1, 1, 1, 1) dtype=float32, numpy=array([[[[-0.9592359]]]], dtype=float32)>
}). Make sure the slot variables are created under the same strategy scope. This may happen if you're restoring from a checkpoint outside the scope

或类似的。
要解决此问题,您只需使用以下命令在每个副本上运行模型的优化器权重设置:

import tensorflow as tf

strat = tf.distribute.MirroredStrategy()

with strat.scope():
    model = tf.keras.models.Sequential([tf.keras.layers.Conv2D(1, 1, padding='same')])
    model.compile(optimizer='adam', loss='mse')
    model(tf.random.normal([1, 16, 16, 1]))

    model.load_weights('model_weights.hdf5')

def model_weight_setting():
    grad_vars = model.trainable_weights
    zero_grads = [tf.zeros_like(w) for w in grad_vars]
    model.optimizer.apply_gradients(zip(zero_grads, grad_vars))
    with open('optimizer.pkl', 'rb') as f:
        weight_values = pickle.load(f)
    model.optimizer.set_weights(weight_values)

strat.run(model_weight_setting)

出于某种原因,设置模型权重时不需要这样做,但请确保在策略范围内创建(通过此处的调用)并加载模型的权重,否则可能会得到ValueError: Trying to create optimizer slot variable under the scope for tf.distribute.Strategy (<tensorflow.python.distribute.collective_all_reduce_strategy.CollectiveAllReduceStrategy object at 0x14ffdce82c50>), which is different from the scope used for the original variable之类的错误。
如果你想要完整的例子,我创建了a colab showcasing this solution

6mw9ycah

6mw9ycah6#

下面的代码适合我(Tensorflow 2.5)。
我使用通用语句编码器作为模型,同时使用Adam优化器。
基本上我做的是:我使用了一个虚拟输入来正确地设置优化器。
之后我设定了重量。

保存优化程序的权重

np.save(f'{path}/optimizer.npy', optimizer.get_weights())

加载优化程序

# Load an optimizer
optimizer = tf.keras.optimizers.Adam()

# Load the optimizer weights
opt_weights = np.load(f'{path}/optimizer.npy', allow_pickle=True)

# Train a dummy record
# I'm using the universal sentence encoder which requires a string as input
with tf.GradientTape() as tape:
    # preduct a dummy record
    tmp = model('')
    # create a dummy loss
    loss = tf.reduce_mean((tmp - tmp)**2)

# calculate the gradiens and add the gradients
# the gradients should be near 0
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

# set the weights
optimizer.set_weights(opt_weights)
q5iwbnjs

q5iwbnjs7#

从版本2.11开始,optimizer.get_weights()不再可用。你可以最终切换到tf.optimizers.legacy类,但不推荐。
相反,类tf.train.Checkpoint是专门为保存模型和优化器权重而设计的:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint.save(path='saved_model/ckpt-1')
...
checkpoint.restore(path='saved_model/ckpt-1')

最后,类tf.train.CheckpointManager管理多个检查点版本,使其变得非常简单:

checkpoint = tf.train.Checkpoint(model=model,optim=optim)
checkpoint_manager = tf.train.CheckpointManager(checkpoint, 'saved_model', max_to_keep = 5)
checkpoint_manager.restore_or_initialize()
...
checkpoint_manager.save()

相关问题