paddle 1.8和2.0的收敛速度差异性过大问题

bogh5gae  于 2022-11-05  发布在  其他
关注(0)|答案(6)|浏览(252)

问题描述:
编写了一个多元线性回归问题,但是在以下环境下收敛速度非常缓慢:

  • 版本、环境信息:

   1)PaddlePaddle版本:2.0及2.2.1
   2)CPU及相关环境:BML CodeLab默认CPU及标准预设环境
   3)GPU:未开启
   4)系统环境:同上2)
结果通过150000次的运算,loss依然在6000以上。
(最终收敛结果:epoch_id is 149000,avg_loss is [6082.4243])

于是我采用了以下环境进行重新训练,收敛速度呈现了上亿倍的差异。
   1)PaddlePaddle版本:1.8
   2)CPU及相关环境:BML CodeLab默认CPU及标准预设环境
   3)GPU:未开启
   4)系统环境:同上2)
通过150000次的运算,loss已经收敛到1.9*10^-10
(最终收敛结果:epoch_id is 149000,avg_loss is [1.9255095e-10])

  • 完整代码

read data from hdfs #1.导入各类库
import paddle
import paddle.fluid as fluid
from paddle.fluid.dygraph.nn import Linear
import numpy as np
import random

#2 ,准备数据b=3a+5c-100
a=list(range(1000))
c=list(range(1000,0,-1))
random.shuffle(a)
random.shuffle(c)
a=np.array(a).reshape([-1,1])
c=np.array(c).reshape([-1,1])
b=a3+c5-100
b=np.reshape(b,[-1,1])
a_c_input=np.concatenate((a,c),axis=1)
a_c_input=np.array(a_c_input).astype("float32")
b=np.array(b).astype("float32")

#3 ,准备模型
class MNIST(fluid.dygraph.Layer):
definit(self):
super(MNIST,self).init()
self.fc=Linear(input_dim=2,output_dim=1,act=None)
def forward(self,inputs):
x=self.fc(inputs)
return x

#4 ,开始训练
with fluid.dygraph.guard():
model=MNIST()
model.train()
iterid=[]
losses=[]
image=fluid.dygraph.to_variable(a_c_input)
label=fluid.dygraph.to_variable(b)
optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=model.parameters())
#optimizer = fluid.optimizer.SGDOptimizer(learning_rate=0.001, parameter_list=model.parameters())
#optimizer = fluid.optimizer.AdamOptimizer(learning_rate=0.01, regularization=fluid.regularizer.L2Decay(regularization_coeff=0.1),
#parameter_list=model.parameters())
EPOCH_NUM=150000
for epoch_id in range(EPOCH_NUM):

predict=model(image)
    loss=fluid.layers.square_error_cost(predict,label)
    avg_loss=fluid.layers.mean(loss)
    iterid.append(epoch_id)
    losses.append(avg_loss.numpy())
    if epoch_id % 1000 ==0:
        print("epoch_id is {},avg_loss is {}".format(epoch_id,avg_loss.numpy()))
    avg_loss.backward()
    optimizer.minimize(avg_loss)
    model.clear_gradients
fluid.save_dygraph(model.state_dict(),"one_yuan")

import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(iterid,losses)
plt.grid()
plt.show()

————————————————————————华丽分割线————————————————————————————

Thank you for contributing to PaddlePaddle.
Before submitting the issue, you could search issue in the github.Probably there was a similar issue submitted or resolved before.
If there is no solution,please make sure that this is a issue of models including the following details:

System information

-PaddlePaddle version (eg.1.1)or CommitID
-CPU: including CPUMKL/OpenBlas/MKLDNN version
-GPU: including CUDA/CUDNN version
-OS Platform (eg.Mac OS 10.14)
-Python version
-Name of Models&Dataset/details of operator
Note: You can get most of the information by running summary_env.py .

To Reproduce

Steps to reproduce the behavior

Describe your current behavior
Code to reproduce the issue
Other info / logs

ne5o7dgx

ne5o7dgx1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看 官网API文档常见问题历史IssueAI社区 来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

pkmbmrz7

pkmbmrz72#

@VBPython 你好,是同一份代码,在同一个环境,使用不同paddle版本而出现的问题吗?

r6hnlfcb

r6hnlfcb3#

@VBPython 你好,是同一份代码,在同一个环境,使用不同paddle版本而出现的问题吗?
是的

x8goxv8g

x8goxv8g4#

你好,我分别在1.8和2.2.1版本paddle复现您的代码。左边是1.8的结果,右边是2.2的结果。我发现收敛速度应该也一样,在2.2.1版本下loss甚至变为0了。再看看模型的参数,最后都一模一样

k5ifujac

k5ifujac5#


你好,我分别在1.8和2.2.1版本paddle复现您的代码。左边是1.8的结果,右边是2.2的结果。我发现收敛速度应该也一样,在2.2.1版本下loss甚至变为0了。再看看模型的参数,最后都一模一样

我没有开GPU加速,1.8和2.2.1都是在BML CodeLAB标准环境CPU下跑的。
又跑了一遍,2.2.1下,avg_loss依然有4803.226,没有完全收敛。
model.parameters()在[2.90440536,5.08442593][-121.75109863]
项目已公开,您可以访问 https://aistudio.baidu.com/aistudio/projectdetail/3228519?contributionType=1&shared=1

iugsix8n

iugsix8n6#

paddlpaddle版本信息:

@VBPython 您好,我使用2.2.1版本的cpu环境跑了训练,发现最后还是收敛了,如图所示,你看看版本是否对齐了?

相关问题