AutoGPTQ Error when quantizing GPT2-XL

xwmevbvl 于 5个月前发布在其他

关注(0)|答案(1)|浏览(71)

当运行 examples/quantization/basic_usage_gpt_xl.py 时，在模型打包过程中出现错误：

2023-05-22 04:08:34 INFO [auto_gptq.quantization.gptq] duration: 0.16880011558532715
2023-05-22 04:08:34 INFO [auto_gptq.quantization.gptq] avg loss: 333.67144775390625
2023-05-22 04:08:34 INFO [auto_gptq.modeling._base] Quantizing attn.c_proj in layer 48/48...
2023-05-22 04:08:34 INFO [auto_gptq.quantization.gptq] duration: 0.16850876808166504
2023-05-22 04:08:34 INFO [auto_gptq.quantization.gptq] avg loss: 18.650110244750977
2023-05-22 04:08:35 INFO [auto_gptq.modeling._base] Quantizing mlp.c_fc in layer 48/48...
2023-05-22 04:08:35 INFO [auto_gptq.quantization.gptq] duration: 0.16927051544189453
2023-05-22 04:08:35 INFO [auto_gptq.quantization.gptq] avg loss: 624.6328125
2023-05-22 04:08:35 INFO [auto_gptq.modeling._base] Quantizing mlp.c_proj in layer 48/48...
2023-05-22 04:08:36 INFO [auto_gptq.quantization.gptq] duration: 0.6983904838562012
2023-05-22 04:08:36 INFO [auto_gptq.quantization.gptq] avg loss: 1028.072509765625
2023-05-22 04:08:36 INFO [auto_gptq.modeling._utils] Packing model...
2023-05-22 04:08:36 INFO [auto_gptq.modeling._utils] transformer.h.0.attn.c_attn
Traceback (most recent call last):
  File "/AutoGPTQ/examples/quantization/basic_usage_gpt_xl.py", line 93, in <module>
    main()
  File "/AutoGPTQ/examples/quantization/basic_usage_gpt_xl.py", line 67, in main
    model.quantize(traindataset, use_triton=False)
  File "/home/user/miniconda3/envs/pytorch2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/miniconda3/envs/pytorch2/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 350, in quantize
    pack_model(
  File "/home/user/miniconda3/envs/pytorch2/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 110, in pack_model
    qlayers[name].pack(layers[name], scale, zero, g_idx)
  File "/home/user/miniconda3/envs/pytorch2/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear_old.py", line 96, in pack
    (linear.weight.data[:, idx] + scale_zeros[g_idx]) / self.scales[g_idx]
RuntimeError: The size of tensor a (1600) must match the size of tensor b (4800) at non-singleton dimension 0

将 desc_act=False 更改为 True 时，错误不会发生。请注意，即使将 desc_act=False 更改为 group_size=-1,错误仍然会发生。AutoGPTQ 的版本是最新的 d4011d2
环境：Ubuntu 22.04,pytorch 2.0.0,CUDA 11.8,transformers 4.29.2,GPU:4090

AutoGPTQ

来源：https://github.com/AutoGPTQ/AutoGPTQ/issues/95