为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题: 无
- 版本、环境信息:
Paddle version: 1.8.3
Paddle With CUDA: True
OS: debian stretch/sid
Python version: 3.7.7
CUDA version: 10.1.243
cuDNN version: None.None.None # 注:系统中安装libcudnn.so.7.6.5,且将路径加入$LD_LIBRARY_PATH
Nvidia driver version: 418.74
- 问题描述:
使用PaddleDetection中的tool/eval.py进行推理,环境为单机,单卡或多卡,Tesla V100 16G或 Titan RTX
模型为Cascade RCNN (backbone为R101vd或R200vd) , multi-scale test。
以下报错大约有30%的概率出现(使用相同的脚本无法稳定复现,不知道跟什么原因有关):
2020-08-05 12:50:51,695-INFO: start loading proposals
2020-08-05 12:50:52,457-INFO: loading roidb 2012_test
100%|████████████████████████████████████████| 970/970 [00:01<00:00, 601.75it/s]
2020-08-05 12:50:54,377-INFO: finish loading roidb from scope 2012_test
2020-08-05 12:50:54,378-INFO: finish loading roidbs, total num = 970
2020-08-05 12:50:54,379-INFO: set max batches to 0
2020-08-05 12:50:54,380-INFO: places would be ommited when DataLoader is not iterable
W0805 12:50:54.530522 4141844 device_context.cc:252] Please NOTE: device: 5, CUDA Capability: 75, Driver API Version: 10.1, Runtime API Version: 10.0
W0805 12:50:55.613425 4141844 device_context.cc:260] device: 5, cuDNN Version: 7.6.
W0805 12:51:24.223932 4141881 init.cc:216] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0805 12:51:24.223980 4141881 init.cc:218] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0805 12:51:24.223989 4141881 init.cc:221] The detail failure signal is:
W0805 12:51:24.224001 4141881 init.cc:224]Aborted at 1596603084 (unix time) try "date -d @1596603084" if you are using GNU date
W0805 12:51:24.228863 4141881 init.cc:224] PC: @ 0x0 (unknown)
W0805 12:51:24.346484 4141881 init.cc:224]***SIGSEGV (@0x8) received by PID 4141844 (TID 0x7f012db3d700) from PID 8; stack trace:***
W0805 12:51:24.351244 4141881 init.cc:224] @ 0x7f01e3671390 (unknown)
W0805 12:51:24.353901 4141881 init.cc:224] @ 0x7f012eda2747 (unknown)
W0805 12:51:24.356168 4141881 init.cc:224] @ 0x7f012ec98d4c (unknown)
W0805 12:51:24.358356 4141881 init.cc:224] @ 0x7f012e41b5fc (unknown)
W0805 12:51:24.360416 4141881 init.cc:224] @ 0x7f012e42b938 (unknown)
W0805 12:51:24.362363 4141881 init.cc:224] @ 0x7f012e41859a cudnnGetConvolutionForwardAlgorithm_v7
W0805 12:51:24.447378 4141881 init.cc:224] @ 0x7f019853ff45 paddle::operators::SearchAlgorithm<>::Find<>()
W0805 12:51:24.469980 4141881 init.cc:224] @ 0x7f01985e1889 paddle::operators::CUDNNConvOpKernel<>::Compute()
W0805 12:51:24.481895 4141881 init.cc:224] @ 0x7f01985e2b33 ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform9CUDAPlaceELb0ELm0EJNS0_9operators17CUDNNConvOpKernelIfEENSA_IdEENSA_INS7_7float16EEEEEclEPKcSH_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4
W0805 12:51:24.504448 4141881 init.cc:224] @ 0x7f019a561ac0 paddle::framework::OperatorWithKernel::RunImpl()
W0805 12:51:24.565385 4141881 init.cc:224] @ 0x7f019a5622b1 paddle::framework::OperatorWithKernel::RunImpl()
W0805 12:51:24.604465 4141881 init.cc:224] @ 0x7f019a55b261 paddle::framework::OperatorBase::Run()
W0805 12:51:24.635419 4141881 init.cc:224] @ 0x7f019a268f16 paddle::framework::details::ComputationOpHandle::RunImpl()
W0805 12:51:24.657658 4141881 init.cc:224] @ 0x7f019a210551 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
W0805 12:51:24.673673 4141881 init.cc:224] @ 0x7f019a20e04f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
W0805 12:51:24.687579 4141881 init.cc:224] @ 0x7f019a20e314 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
W0805 12:51:24.724630 4141881 init.cc:224] @ 0x7f0197001fb3 std::_Function_handler<>::_M_invoke()
W0805 12:51:24.769093 4141881 init.cc:224] @ 0x7f0196dfd647 std::__future_base::_State_base::_M_do_set()
W0805 12:51:24.773929 4141881 init.cc:224] @ 0x7f01e366ea99 __pthread_once_slow
W0805 12:51:24.780242 4141881 init.cc:224] @ 0x7f019a20a4e2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
W0805 12:51:24.817785 4141881 init.cc:224] @ 0x7f0196dffaa4 _ZZN10ThreadPoolC1EmENKUlvE_clEv
W0805 12:51:24.850741 4141881 init.cc:224] @ 0x7f01d4120421 execute_native_thread_routine_compat
W0805 12:51:24.857818 4141881 init.cc:224] @ 0x7f01e36676ba start_thread
W0805 12:51:24.862519 4141881 init.cc:224] @ 0x7f01e339d41d clone
W0805 12:51:24.870891 4141881 init.cc:224] @ 0x0 (unknown)
Segmentation fault (core dumped)
21条答案
按热度按时间s8vozzvw16#
请问下你的paddle安装命令是什么呢
通过pip安装的。env环境如下 (conda env export得到):
env.txt
isr3a4wc17#
请问下你的paddle安装命令是什么呢
mlnl4t2r18#
这是参考的官方提供的代码吗
对的。在PaddleDetection 0.3的基础上改的,参考配置文件为:
https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/configs/rcnn_enhance/cascade_rcnn_dcn_r101_vd_fpn_3x_server_side.yml
multiscale_test的配置参考:
https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.4/configs/cascade_rcnn_cls_aware_r101_vd_fpn_ms_test.yml
eval过程中scales为4个,分别为3072,2048,1024和576.
hrirmatl19#
这是参考的官方提供的代码吗
yftpprvb20#
有相关的代码吗
抱歉,公司政策所限,无法提供更改过的代码。
huwehgph21#
有相关的代码吗