Paddle batch_norm在CPU上训练出NaN

xriantvc  于 2021-11-29  发布在  Java
关注(0)|答案(4)|浏览(450)

单机单卡。
现象:
1、CPU训练(CPUPlace)出NaN,但是相同的参数下,使用GPU(CUDAPlace)就没问题。
2、定位到可能和batch_norm相关:去掉batch_norm,CPU也不会出NaN。
3、出现NaN的位置随机。

q1qsirdb

q1qsirdb1#

麻烦确认一下:

  1. 使用的paddle是哪个版本
  2. 训练机型配置
  3. CPU上和GPU上的初始权重,读入数据等一致么,出Nan位置随机的话有没可能是一些不一致的随机量导致的
  4. 是否是gradient过大导致的nan,有误gradient clip或者weight decay等相关操作
huwehgph

huwehgph2#

  1. paddle是我用源码编译的gpu版本:whl包名称是:paddlepaddle_gpu-0.0.0-cp27-cp27mu-linux_x86_64.whl
  2. 训练机型:CentOS release 6.3、GPU是Tesla K40m、cudnn7
  3. 读入数据都强制是相同的。参数是随机初始化的,但实验了几次,CPU的NAN稳定复现,GPU均不出现
    4.有gradient clip,没有weight decay
nmpmafwu

nmpmafwu3#

做了进一步的实验,在GPU上save_persistables 导出初始化参数, CPU上load_vars导入(保证两者初始化相同)测了一下。情况仍然相同,即CPU出NaN,GPU没问题。而去掉batch_norm后,CPU和GPU的结果便完全相同。

i2byvkas

i2byvkas4#

提供一个可复现的自造的网络:

import paddle.fluid as fluid
import os
import sys
import numpy as np
import paddle
import random

big_n =99
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
def base_reader():
    def reader():
        for i in range(10000):
            seq = []
            for j in range(10):
                seq.append(random.randint(0,big_n))
            seq = np.array(seq).reshape((10,1))
            label = float(random.randint(0,1))
            yield seq, label

return reader
batch_reader = paddle.batch(base_reader(), batch_size=32)

use_cuda = False
if len(sys.argv) >=2 and sys.argv[1] == "gpu":
    use_cuda = True

seq = fluid.layers.data(name="seq", shape=[12,1], dtype="int64")
label = fluid.layers.data(name="label", shape=[1], dtype="float32")
emb = fluid.layers.embedding(
                             input=seq,
                             size=[100,128])
emb_sum = fluid.layers.reduce_sum(emb, dim=1)

fc0 = fluid.layers.fc(name="fc0", input=emb_sum, size=64)

fc0_reshape = fluid.layers.reshape(x=fc0, shape=[0,1, 64])
fc_bn = fluid.layers.batch_norm(input=fc0_reshape, epsilon=0.001, momentum=0.99, data_layout="NHWC")

fc1 = fluid.layers.fc(name="fc1", input=fc_bn, size=64, num_flatten_dims=2)
fc1_bn = fluid.layers.batch_norm(input=fc1, epsilon=0.001, momentum=0.99, data_layout="NHWC")
fc1_reshape = fluid.layers.reshape(x=fc1_bn, shape=[0,64])

fc2 = fluid.layers.fc(name="fc2", input=fc1_reshape, size=1)

loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=fc2, label=label)
avg_loss = fluid.layers.mean(loss)
fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=5.0))
sgd_optimizer = fluid.optimizer.SGD(learning_rate=1.)
sgd_optimizer.minimize(avg_loss)

place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

# fluid.io.save_persistables(exe, "./gpu/")

"""
    def is_parameter(var):
    if isinstance(var, fluid.framework.Parameter):
    return isinstance(var, fluid.framework.Parameter)

    vars = filter(is_parameter, fluid.default_main_program().list_vars())
    fluid.io.load_vars(exe, "./gpu", vars=vars)
    """

feeder = fluid.DataFeeder(
                          feed_list=["seq", "label"], place=place)

for id in range(20):
    step = 0
    for data in batch_reader():
        results = exe.run(
                          feed=feeder.feed(data),
                          fetch_list=[avg_loss.name],
                          return_numpy=True)
        print "results:%.4lf" % (results[0].mean())

执行python reshow.py gpu和python reshow.py即可分别得到gpu和cpu运行的结果

相关问题