tensorflow tf.keras.layers.BatchNormalization with trainable=False似乎不会更新其内部移动均值和方差

nom7f22z 于 2023-10-23 发布在其他

关注(0)|答案(1)|浏览(93)

我试图找出BatchNormalization层在TensorFlow中的确切行为。我想出了下面的代码，据我所知，这应该是一个完全有效的keras模型，但是BatchNormalization的均值和方差似乎没有更新。
从文档https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization
在BatchNormalization层的情况下，在层上设置trainable = False意味着该层随后将在推理模式下运行（意味着它将使用移动均值和移动方差来规范化当前批次，而不是使用当前批次的均值和方差）。
我希望模型在每次后续的预测调用中返回不同的值。然而，我看到的是完全相同的值返回了10次。谁能解释一下为什么BatchNormalization层不更新它的内部值？

import tensorflow as tf
import numpy as np

if __name__ == '__main__':

    np.random.seed(1)
    x = np.random.randn(3, 5) * 5 + 0.3

    bn = tf.keras.layers.BatchNormalization(trainable=False, epsilon=1e-9)
    z = input = tf.keras.layers.Input([5])
    z = bn(z)

    model = tf.keras.Model(inputs=input, outputs=z)

    for i in range(10):
        print(x)
        print(model.predict(x))
        print()

我使用的是TensorFlow 2.1.0

tensorflow

来源：https://stackoverflow.com/questions/64203611/tf-keras-layers-batchnormalization-with-trainable-false-appears-to-not-update-it

1条答案

按热度按时间

h43kikqp1#

好吧，我发现了我假设中的错误。移动平均线是在训练过程中更新的，而不是像我想的那样在推理过程中更新。这是非常有意义的，因为在推理过程中更新移动平均值可能会导致不稳定的生产模型（例如，一长串高度病态的输入样本[例如，使得它们的生成分布与训练网络的分布显著不同]可能潜在地使网络偏置并导致有效输入样本的性能更差）。
当您微调预训练模型并希望在训练期间冻结网络的某些层时，可训练参数非常有用。因为当调用model.predict(x)（甚至model(x)或model(x, training=False)）时，图层会自动使用移动平均值而不是批平均值。
下面的代码清楚地演示了这一点

import tensorflow as tf
import numpy as np

if __name__ == '__main__':

    np.random.seed(1)
    x = np.random.randn(10, 5) * 5 + 0.3

    z = input = tf.keras.layers.Input([5])
    z = tf.keras.layers.BatchNormalization(trainable=True, epsilon=1e-9, momentum=0.99)(z)

    model = tf.keras.Model(inputs=input, outputs=z)
    
    # a dummy loss function
    model.compile(loss=lambda x, y: (x - y) ** 2)

    # a dummy fit just to update the batchnorm moving averages
    model.fit(x, x, batch_size=3, epochs=10)
    
    # first predict uses the moving averages from training
    pred = model(x).numpy()
    print(pred.mean(axis=0))
    print(pred.var(axis=0))
    print()
    
    # outputs the same thing as previous predict
    pred = model(x).numpy()
    print(pred.mean(axis=0))
    print(pred.var(axis=0))
    print()
    
    # here calling the model with training=True results in update of moving averages
    # furthermore, it uses the batch mean and variance as in training, 
    # so the result is very different
    pred = model(x, training=True).numpy()
    print(pred.mean(axis=0))
    print(pred.var(axis=0))
    print()
    
    # here we see again that the moving averages are used but they differ slightly after
    # the previous call, as expected
    pred = model(x).numpy()
    print(pred.mean(axis=0))
    print(pred.var(axis=0))
    print()

最后，我发现文档（https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization）提到了这一点：
1.当使用包含批量归一化的模型执行推断时，通常（但不总是）希望使用累积统计而不是小批量统计。这是通过在调用模型时传递training=False或使用model.predict来实现的。
希望这能帮助将来有类似误解的人。

赞(0）回复(0）举报 2023-10-23

我来回答

tensorflow tf.keras.layers.BatchNormalization with trainable=False似乎不会更新其内部移动均值和方差

1条答案

相关问题

热门标签

最新问答