Paddle (论文复现)模型参数转换成float16无法读取，pytorch就可以

f45qwnt8 于 2022-10-25 发布在其他

关注(0)|答案(6)|浏览(507)

paddlepaddle 2.21 rtx3090
pytorch模型转换成paddle模型的时候保存为float16，在读取这个模型时报错：
AssertionError: Variable dtype not match, Variable [ embedding_0.w_0 ] need tensor with dtype float32 but load tensor with dtype float16
但是我把参数保存成float32的模型，paddle会把pytorch的float16参数还原成float32，小数点后的值会更精确，这样paddle和pytorch的模型权重精度不一样，值也就不一样了
举个例子一个linear层pytorch的weight参数是：
tensor([[[ 0.9105, 2.0717, 0.4330, ..., 2.0011, 0.5822, 0.3467],
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.9105, 2.0717, 0.4330, ..., 2.0011, 0.5822, 0.3467],
...,
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.1379, -0.3963, -0.3805, ..., 0.2929, 0.8504, -0.5717],
[ 0.6593, 0.5001, 0.2291, ..., 0.0708, -0.4476, -0.5000]],

[[ 1.2242, -0.4303,  1.2490,  ..., -0.6407,  0.0060, -0.1868],
     [ 0.6266,  0.2603,  1.0437,  ...,  1.4394,  0.4805, -0.9743],
     [ 0.6593,  0.5001,  0.2291,  ...,  0.0708, -0.4476, -0.5000],
     ...,
     [ 0.1379, -0.3963, -0.3805,  ...,  0.2929,  0.8504, -0.5717],
     [ 0.1379, -0.3963, -0.3805,  ...,  0.2929,  0.8504, -0.5717],
     [ 0.6266,  0.2603,  1.0437,  ...,  1.4394,  0.4805, -0.9743]]],
   grad_fn=<AddBackward0>)

paddle的weight参数是：
Tensor(shape=[2, 10, 1536], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
[[[ 0.91043848, 2.07156706, 0.43298256, ..., 2.00105762,
0.58228028, 0.34676933],
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.91043848, 2.07156706, 0.43298256, ..., 2.00105762,
0.58228028, 0.34676933],
...,
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.13766873, -0.39638609, -0.38053572, ..., 0.29296547,
0.85042048, -0.57174611],
[ 0.65930241, 0.50013089, 0.22916237, ..., 0.07050779,
-0.44762430, -0.50017548]],

[[ 1.22439563, -0.43040252,  1.24902594, ..., -0.64091873,
       0.00600347, -0.18667769],
     [ 0.62675941,  0.26044115,  1.04366195, ...,  1.43960536,
       0.48053461, -0.97446364],
     [ 0.65930241,  0.50013089,  0.22916237, ...,  0.07050779,
      -0.44762430, -0.50017548],
     ...,
     [ 0.13766873, -0.39638609, -0.38053572, ...,  0.29296547,
       0.85042048, -0.57174611],
     [ 0.13766873, -0.39638609, -0.38053572, ...,  0.29296547,
       0.85042048, -0.57174611],
     [ 0.62675941,  0.26044115,  1.04366195, ...,  1.43960536,
       0.48053461, -0.97446364]]])

本来输入误差只有e-8，但是经过这两个不同精度的权重计算后误差就有0.0001，越到后面误差越大没法对齐
这该怎么办

Paddle

来源：https://github.com/PaddlePaddle/Paddle/issues/38717

6条答案

按热度按时间

41ik7eoe1#

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue 、 AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API ， FAQ ， Github Issue and AI community to get the answer.Have a nice day!

赞(0）回复(0）举报 2022-10-25

d5vmydt92#

Paddle对float16支持有限，支持float16 op列表参考 https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/amp/Overview_cn.html。可以将PyTorch模型参数以float32保存，然后通过Paddle加载。Paddle支持自动混合精度进行训练加速
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/01_paddle2.0_introduction/basic_concept/amp_cn.html

赞(0）回复(0）举报 2022-10-25

h7wcgrx33#

可是对齐的时候误差就在10的负4次方了啊

赞(0）回复(0）举报 2022-10-25

jqjz2hbq4#

模型保存和加载都是float32，训练时可以用float16加速。

赞(0）回复(0）举报 2022-10-25

2wnc66cl5#

是用的x2paddle，还是自己写的转换方法？