Paddle (论文复现)模型参数转换成float16无法读取,pytorch就可以

f45qwnt8  于 2022-10-25  发布在  其他
关注(0)|答案(6)|浏览(507)

paddlepaddle 2.21 rtx3090
pytorch模型转换成paddle模型的时候保存为float16,在读取这个模型时报错:
AssertionError: Variable dtype not match, Variable [ embedding_0.w_0 ] need tensor with dtype float32 but load tensor with dtype float16
但是我把参数保存成float32的模型,paddle会把pytorch的float16参数还原成float32,小数点后的值会更精确,这样paddle和pytorch的模型权重精度不一样,值也就不一样了
举个例子一个linear层pytorch的weight参数是:
tensor([[[ 0.9105, 2.0717, 0.4330, ..., 2.0011, 0.5822, 0.3467],
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.9105, 2.0717, 0.4330, ..., 2.0011, 0.5822, 0.3467],
...,
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.6593, 0.5001, 0.2291, ..., 0.0708, -0.4476, -0.5000]],

[[ 1.2242, -0.4303,  1.2490,  ..., -0.6407,  0.0060, -0.1868],
     [ 0.6266,  0.2603,  1.0437,  ...,  1.4394,  0.4805, -0.9743],
     [ 0.6593,  0.5001,  0.2291,  ...,  0.0708, -0.4476, -0.5000],
     ...,
     [ 0.1379, -0.3963, -0.3805,  ...,  0.2929,  0.8504, -0.5717],
     [ 0.1379, -0.3963, -0.3805,  ...,  0.2929,  0.8504, -0.5717],
     [ 0.6266,  0.2603,  1.0437,  ...,  1.4394,  0.4805, -0.9743]]],
   grad_fn=<AddBackward0>)

paddle的weight参数是:
Tensor(shape=[2, 10, 1536], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
[[[ 0.91043848, 2.07156706, 0.43298256, ..., 2.00105762,
0.58228028, 0.34676933],
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.91043848, 2.07156706, 0.43298256, ..., 2.00105762,
0.58228028, 0.34676933],
...,
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.65930241, 0.50013089, 0.22916237, ..., 0.07050779,
-0.44762430, -0.50017548]],

[[ 1.22439563, -0.43040252,  1.24902594, ..., -0.64091873,
       0.00600347, -0.18667769],
     [ 0.62675941,  0.26044115,  1.04366195, ...,  1.43960536,
       0.48053461, -0.97446364],
     [ 0.65930241,  0.50013089,  0.22916237, ...,  0.07050779,
      -0.44762430, -0.50017548],
     ...,
     [ 0.13766873, -0.39638609, -0.38053572, ...,  0.29296547,
       0.85042048, -0.57174611],
     [ 0.13766873, -0.39638609, -0.38053572, ...,  0.29296547,
       0.85042048, -0.57174611],
     [ 0.62675941,  0.26044115,  1.04366195, ...,  1.43960536,
       0.48053461, -0.97446364]]])

本来输入误差只有e-8,但是经过这两个不同精度的权重计算后误差就有0.0001,越到后面误差越大没法对齐
这该怎么办

41ik7eoe

41ik7eoe1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看 官网API文档常见问题历史IssueAI社区 来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

d5vmydt9

d5vmydt92#

Paddle对float16支持有限,支持float16 op列表参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/amp/Overview_cn.html。 可以将PyTorch模型参数以float32保存,然后通过Paddle加载。Paddle支持自动混合精度进行训练加速
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html

h7wcgrx3

h7wcgrx33#

可是对齐的时候误差就在10的负4次方了啊

jqjz2hbq

jqjz2hbq4#

模型保存和加载都是float32,训练时可以用float16加速。

2wnc66cl

2wnc66cl5#

是用的x2paddle,还是自己写的转换方法?

qvk1mo1f

qvk1mo1f6#

自己写的转换方法,复现课程里面的,每一个全连接层转置

相关问题