在Keras中，是否有类似于Pytorch中的zero_grad()的函数？

lvmkulzt 于 2023-08-06 发布在其他

关注(0)|答案(2)|浏览(91)

在Pytorch中，我们可以调用zero_grad()来清除渐变。在Keras中，我们是否有类似的功能，以便我们可以实现相同的事情？例如，我想在一些批次中累积梯度。

keras

来源：https://stackoverflow.com/questions/60360704/in-keras-is-there-any-function-similar-to-the-zero-grad-in-pytorch

2条答案

按热度按时间

fcipmucu1#

在Pytorch中，每个变量的梯度都是累积的，损失值分布在所有变量中。然后，优化器负责更新模型参数（在初始化时指定），由于更新值始终保存在内存中，因此必须在开始时将update的值置零。

optimizer = torch.optim.Adam(itertools.chain(*param_list), lr=opt.lr, ...)
...
optimizer.zero_grad()
loss = ...
loss.backward()
optimizer.step()

字符串
在keras中，你需要使用渐变带来 Package 一堆你想要计算渐变的变量的操作。调用磁带上的gradient方法来计算更新，传递损失值和必须计算梯度更新的变量。优化器只对一个参数应用一次更新（对于您指定的整个updates-params列表）。

with tf.GradientTape() as tape:
    loss = ...
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

型
你可以使用.fit（）方法来代替，它在引擎盖下完成了所有这些工作。
如果您的目标是累积多次更新，在Keras中没有标准方法，但您可以使用磁带更轻松地完成此操作，在应用它们之前累积更新值（请参阅此https：//www.tensorflow.org/API_docs/python/tf/GradientTape#：~：text=To%20compute%20multiple%20gradients%20over%20the%20same%20computation）。
这里解释了使用.fit()执行此操作的一个很好的解决方案：How to accumulate gradients for large batch sizes in Keras

如果您想了解更多关于如何有效跟踪参数梯度以分布损失值并更好地理解整个过程，请查看(Wikipedia) Automatic differentiation *

赞(0）回复(0）举报 2023-08-06

bnlyeluc2#

如果在自定义训练循环中，很容易实现：

...
# this is a glance of your custom training loop
# consider a`flag` has defined to control your behavior 
# consider a `buf= []` has defined to control your behavior 
with tf.GradientTape() as tape:
    loss = ...
grads = tape.gradient(loss, model.trainable_variables)
if flag: # do not accumulate grads 
    _grads = some_func(buf) # deal with accumulated grads in buf
    buf = [] # clear buf
    optimizer.apply_gradients(zip(_grads, model.trainable_variables))
else: # accumulate grads 
    buf.append(grads) 
...

字符串
如果在高级Keras API 'model.compile（），model.fit（）'中，我不知道，因为我都使用TF2和Pytorch，我更喜欢自定义训练循环，这是缩小两者之间距离的更简单方法。

赞(0）回复(0）举报 2023-08-06

我来回答

在Keras中，是否有类似于Pytorch中的zero_grad()的函数？

2条答案

相关问题

热门标签

最新问答