单机单卡。现象:1、CPU训练(CPUPlace)出NaN,但是相同的参数下,使用GPU(CUDAPlace)就没问题。2、定位到可能和batch_norm相关:去掉batch_norm,CPU也不会出NaN。3、出现NaN的位置随机。
q1qsirdb1#
麻烦确认一下:
huwehgph2#
nmpmafwu3#
做了进一步的实验,在GPU上save_persistables 导出初始化参数, CPU上load_vars导入(保证两者初始化相同)测了一下。情况仍然相同,即CPU出NaN,GPU没问题。而去掉batch_norm后,CPU和GPU的结果便完全相同。
i2byvkas4#
提供一个可复现的自造的网络:
import paddle.fluid as fluid import os import sys import numpy as np import paddle import random big_n =99 os.environ['CUDA_VISIBLE_DEVICES'] = '1' def base_reader(): def reader(): for i in range(10000): seq = [] for j in range(10): seq.append(random.randint(0,big_n)) seq = np.array(seq).reshape((10,1)) label = float(random.randint(0,1)) yield seq, label return reader batch_reader = paddle.batch(base_reader(), batch_size=32) use_cuda = False if len(sys.argv) >=2 and sys.argv[1] == "gpu": use_cuda = True seq = fluid.layers.data(name="seq", shape=[12,1], dtype="int64") label = fluid.layers.data(name="label", shape=[1], dtype="float32") emb = fluid.layers.embedding( input=seq, size=[100,128]) emb_sum = fluid.layers.reduce_sum(emb, dim=1) fc0 = fluid.layers.fc(name="fc0", input=emb_sum, size=64) fc0_reshape = fluid.layers.reshape(x=fc0, shape=[0,1, 64]) fc_bn = fluid.layers.batch_norm(input=fc0_reshape, epsilon=0.001, momentum=0.99, data_layout="NHWC") fc1 = fluid.layers.fc(name="fc1", input=fc_bn, size=64, num_flatten_dims=2) fc1_bn = fluid.layers.batch_norm(input=fc1, epsilon=0.001, momentum=0.99, data_layout="NHWC") fc1_reshape = fluid.layers.reshape(x=fc1_bn, shape=[0,64]) fc2 = fluid.layers.fc(name="fc2", input=fc1_reshape, size=1) loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=fc2, label=label) avg_loss = fluid.layers.mean(loss) fluid.clip.set_gradient_clip(clip=fluid.clip.GradientClipByGlobalNorm(clip_norm=5.0)) sgd_optimizer = fluid.optimizer.SGD(learning_rate=1.) sgd_optimizer.minimize(avg_loss) place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace() exe = fluid.Executor(place) exe.run(fluid.default_startup_program()) # fluid.io.save_persistables(exe, "./gpu/") """ def is_parameter(var): if isinstance(var, fluid.framework.Parameter): return isinstance(var, fluid.framework.Parameter) vars = filter(is_parameter, fluid.default_main_program().list_vars()) fluid.io.load_vars(exe, "./gpu", vars=vars) """ feeder = fluid.DataFeeder( feed_list=["seq", "label"], place=place) for id in range(20): step = 0 for data in batch_reader(): results = exe.run( feed=feeder.feed(data), fetch_list=[avg_loss.name], return_numpy=True) print "results:%.4lf" % (results[0].mean())
执行python reshow.py gpu和python reshow.py即可分别得到gpu和cpu运行的结果
4条答案
按热度按时间q1qsirdb1#
麻烦确认一下:
huwehgph2#
4.有gradient clip,没有weight decay
nmpmafwu3#
做了进一步的实验,在GPU上save_persistables 导出初始化参数, CPU上load_vars导入(保证两者初始化相同)测了一下。情况仍然相同,即CPU出NaN,GPU没问题。而去掉batch_norm后,CPU和GPU的结果便完全相同。
i2byvkas4#
提供一个可复现的自造的网络:
执行python reshow.py gpu和python reshow.py即可分别得到gpu和cpu运行的结果