PaddleOCR kie 训练时cuda报错 an illegal memory access was encountered.

zpgglvta  于 2022-10-21  发布在  其他
关注(0)|答案(4)|浏览(386)

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

使用wildreceipt或者我们自己的数据集都无法正常训练,batch_size调整到1之后依然存在这个报错

  • 系统环境/System Environment:win11、GPU: 3060
  • 版本号/Version:Paddle:2.3.0.post112 PaddleOCR:2.5.0.3 问题相关组件/Related components:
  • 运行指令/Command Code:python ./train.py -c ../configs/kie/kie_unet_sdmgr.yml -o Global.pretrained_model=../pretrained_model/kie_vgg16/best_accuracy.pdparams
  • 完整报错/Complete Error Message:
W0625 09:48:01.554232 15140 gpu_context.cc:306] device: 0, cuDNN Version: 8.2.
[2022/06/25 09:48:04] ppocr INFO: load pretrain successful from ../pretrained_model/kie_vgg16/best_accuracy
[2022/06/25 09:48:04] ppocr INFO: train dataloader has 1267 iters
[2022/06/25 09:48:04] ppocr INFO: valid dataloader has 472 iters
[2022/06/25 09:48:04] ppocr INFO: During the training process, after the 0th iteration, an evaluation is run every 80 iterations
Traceback (most recent call last):
  File "E:\project-space\ocr-tools\PaddleOCR\tools\train.py", line 191, in <module>
    main(config, device, logger, vdl_writer)
  File "E:\project-space\ocr-tools\PaddleOCR\tools\train.py", line 164, in main
    program.train(config, train_dataloader, valid_dataloader, device, model,
  File "E:\project-space\ocr-tools\PaddleOCR\tools\program.py", line 264, in train
    preds = model(batch)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs,**kwargs)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs,**kwargs)
  File "E:\project-space\ocr-tools\PaddleOCR\ppocr\modeling\architectures\base_model.py", line 85, in forward
    x = self.head(x, targets=data)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs,**kwargs)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs,**kwargs)
  File "E:\project-space\ocr-tools\PaddleOCR\ppocr\modeling\heads\kie_sdmgr_head.py", line 90, in forward
    nodes = self.fusion([x, nodes])
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 930, in __call__
    return self._dygraph_call_func(*inputs,**kwargs)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\layers.py", line 915, in _dygraph_call_func
    outputs = self.forward(*inputs,**kwargs)
  File "E:\project-space\ocr-tools\PaddleOCR\ppocr\modeling\heads\kie_sdmgr_head.py", line 189, in forward
    z = F.normalize(z)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\nn\functional\norm.py", line 88, in normalize
    eps = fluid.dygraph.base.to_variable([epsilon], dtype=x.dtype)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\decorator.py", line 232, in fun
    return caller(func, *(extras + args),**kw)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args,**kwargs)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\framework.py", line 434, in __impl__
    return func(*args,**kwargs)
  File "D:\anaconda3\envs\paddle_env\lib\site-packages\paddle\fluid\dygraph\base.py", line 763, in to_variable
    py_var = core.VarBase(
OSError: (External) CUDA error(700), an illegal memory access was encountered. 
  [Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at ..\paddle\phi\backends\gpu\cuda\cuda_info.cc:258)

Process finished with exit code 1
igsr9ssn

igsr9ssn1#

我在Windows下也碰到过这个问题,换到wsl里就好了,感觉是windows驱动和paddlepaddle-gpu之间的问题

#6533

alen0pnh

alen0pnh2#

我在Windows下也碰到过这个问题,换到wsl里就好了,感觉是windows驱动和paddlepaddle-gpu之间的问题

#6533

我用cpu训练没有问题,用GPU一直报错,并且我用wildreceipt数据集都可已正常训练,我试试wsl吧

gmol1639

gmol16393#

我在Windows下也碰到过这个问题,换到wsl里就好了,感觉是windows驱动和paddlepaddle-gpu之间的问题
#6533

我用cpu训练没有问题,用GPU一直报错,并且我用wildreceipt数据集都可已正常训练,我试试wsl吧

我和你一模一样,报错的地方也一样,我也是CPU没问题,反正换了WSL就正常工作了

qyzbxkaa

qyzbxkaa4#

我在Windows下也碰到过这个问题,换到wsl里就好了,感觉是windows驱动和paddlepaddle-gpu之间的问题
#6533

我用cpu训练没有问题,用GPU一直报错,并且我用wildreceipt数据集都可已正常训练,我试试wsl吧

我和你一模一样,报错的地方也一样,我也是CPU没问题,反正换了WSL就正常工作了

裂开,官方对这个问题也不做处理~

相关问题