bug描述 Describe the Bug
这是我们的模型训练代码,它在CUDA 11.6和CUDA 11.8的输出差异超过了0.3。我们近一步地跟pytorch进行了对比,pytorch的输出结果与使用CUDA 11.6的几乎一致。
class Model_zpge5M_FqQx86Dj1nV7Njal5bquCfBcN(nn.Module):
def __init__(self):
super(Model_zpge5M_FqQx86Dj1nV7Njal5bquCfBcN, self).__init__()
self.conv1_mutated = torch.nn.ConvTranspose2d(in_channels=1, out_channels=6, kernel_size=[5, 5], stride=[1, 1], padding=[0, 0], output_padding=[0, 0], dilation=[1, 1], groups=1, bias=True)
self.relu1 = torch.nn.ReLU()
self.pool1_mutated = torch.nn.MaxPool2d(kernel_size=[3, 1], stride=[2, 2], padding=[0, 0], dilation=1, ceil_mode=False)
self.conv2_mutated = torch.nn.Conv2d(in_channels=6, out_channels=16, kernel_size=[5, 5], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True)
self.relu2_mutated = torch.ceil
self.pool2 = torch.nn.MaxPool2d(kernel_size=[2, 2], stride=[2, 2], padding=[0, 0], dilation=1, ceil_mode=False)
self.flatten = torch.nn.Flatten()
self.linear1_mutated = torch.nn.Linear(in_features=672, out_features=120)
self.relu3_mutated = torch.round
self.linear2_mutated = torch.nn.Linear(in_features=120, out_features=84)
self.tail_flatten = torch.nn.Flatten()
self.tail_fc = torch.nn.Linear(in_features=84, out_features=10)
def forward(self, input):
conv1_output = self.conv1_mutated(input)
relu1_output = self.relu1(conv1_output)
maxpool1_output = self.pool1_mutated(relu1_output)
conv2_output = self.conv2_mutated(maxpool1_output)
relu2_output = self.relu2_mutated(conv2_output)
maxpool2_output = self.pool2(relu2_output)
flatten_output = self.flatten(maxpool2_output)
fc1_output = self.linear1_mutated(flatten_output)
relu3_output = self.relu3_mutated(fc1_output)
fc2_output = self.linear2_mutated(relu3_output)
tail_flatten_output = self.tail_flatten(fc2_output)
tail_fc_output = self.tail_fc(tail_flatten_output)
tail_fc_output = tail_fc_output
return tail_fc_output
复现代码
https://github.com/PhyllisJi/MoCoDiff_Bug/tree/paddle-issue%2364537
其中有详细的复现步骤
我们反复运行了十次,皆有较大的差异。
输出差异
# cuda 11.6
# W0522 14:31:24.082360 3337 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.6, Runtime API Version: 11.8
# W0522 14:31:24.083253 3337 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
# W0522 14:31:24.083271 3337 gpu_resources.cc:196] WARNING: device: 0. The installed Paddle is compiled with CUDA 11.8, but CUDA runtime version in your machine is 11.6, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
relu2_output.npz 0.0
maxpool2_output.npz 0.0
fc2_output.npz 1.6689300537109375e-06
conv2_output.npz 0.0
relu3_output.npz 0.0
flatten_output.npz 0.0
output.npz 1.430511474609375e-06
fc1_output.npz 1.1920928955078125e-06
# cuda 11.8
# W0522 14:19:26.351598 496 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.2, Runtime API Version: 11.8
# W0522 14:19:26.352345 496 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9.
conv2_output.npz 0.0
relu2_output.npz 0.0
maxpool2_output.npz 0.0
flatten_output.npz 0.0
fc1_output.npz 0.0008640289306640625
relu3_output.npz 1.0
fc2_output.npz 0.17127180099487305
output.npz 0.3886955976486206
其他补充信息 Additional Supplementary Information
paddle版本 2.6.1
4条答案
按热度按时间dfty9e191#
您好,从这个issue 反馈的结果上看,relu2_output.npz 的结果一致,fc1_output.npz 存在 diff,这中间还经过了 pool2、flatten,可否再提供一下这两步的计算结果对比,以此确定是哪个算子的计算存在 diff
4c8rllxm2#
您好,从这个issue 反馈的结果上看,relu2_output.npz 的结果一致,fc1_output.npz 存在 diff,这中间还经过了 pool2、flatten,可否再提供一下这两步的计算结果对比,以此确定是哪个算子的计算存在 diff
已经更新
https://github.com/PhyllisJi/MoCoDiff_Bug/tree/paddle-issue%2364537
bvk5enib3#
看日志,是fc1_output结果有 diff,这个就是一个简单的 linear,日志显示输入是 flatten 的输出,都是0,Linear bias 都是空的,理论上计算输出也应该是0吧?
brqmpdu14#
看日志,是fc1_output结果有 diff,这个就是一个简单的 linear,日志显示输入是 flatten 的输出,都是0,Linear bias 都是空的,理论上计算输出也应该是0吧?
我们这里的数据给的是和完全一致的pytorch版本的代码的层输出的差异,如果是0,表示完全一致或者非常接近。现在的情况是在CUDA 11.6上面和pytorch的输出基本上完全一致,但是在CUDA 11.8上的运行结果和pytorch的结果有较大的差异,跳过pytorch,可以看出来直接对比两个CUDA版本上的输出,也是会有巨大差异的。