- 标题:OCR End-to-end 模型预测时出 core
- 版本、环境信息:
1)PaddlePaddle版本:1.2.0.post85
2)CPU:预测若用CPU,请提供CPU型号,MKL/OpenBlas/MKLDNN/等数学库使用情况
3)系统环境:CentOS 6.3,Python 2.7.14
-预测信息
1)Python预测
- 复现信息:
预测代码基本同 https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/ocr_recognition/attention_model.py 中的 attention_infer
修改部分:
init_ids = fluid.layers.fill_constant_batch_size_like(
input=init_state, shape=[-1, 1], value=0, dtype='int64')
init_ids = fluid.layers.lod_reset(x=init_ids, y=images)
init_scores = fluid.layers.fill_constant_batch_size_like(
input=init_state, shape=[-1, 1], value=1, dtype='float32')
init_scores = fluid.layers.lod_reset(x=init_scores, y=images)
- 问题描述:
第二个时刻的 pre_ids
, pre_score
的 shape 和 lod 不太合理,导致 decoder_state_proj
的 shape 不对,最终在 decoder_state_expand = fluid.layers.sequence_expand
处出 core。
相关的 log:
1547634835 pre_ids Tensor[array_read_0.tmp_0]
shape: [12,1,]
dtype: l
LoD: [[ 0,12, ]]
data: 0,0,0,0,0,0,0,0,0,0,
1547634835 pre_state Tensor[array_read_1.tmp_0]
shape: [12,128,]
dtype: f
data: 0.00607281,0.0318905,0.0380818,0,0.0172632,0,0,0.0151736,0,0.0335344,
1547634835 pre_score Tensor[array_read_2.tmp_0]
shape: [12,1,]
dtype: f
LoD: [[ 0,12, ]]
data: 1,1,1,1,1,1,1,1,1,1,
1547634835 decoder_state_proj Tensor[fc_36.tmp_0]
shape: [12,128,]
dtype: f
data: -0.000996387,-0.0245469,0.0171594,0.00709899,0.00126243,-0.0303947,0.00325922,0.00518241,0.0150218,0.0053458,
1547634835 encoder_proj Tensor[fc_34.tmp_0]
shape: [768,128,]
dtype: f
LoD: [[ 0,64,128,192,256,320,384,448,512,576,640,704,768, ]]
data: 0.00266408,-0.048337,-0.0366806,0.00265633,0.0432429,0.0254773,-0.00408232,-0.0223291,-0.0024811,-0.0433478,
1547634835 context Tensor[sequence_pool_1.tmp_0]
shape: [12,256,]
dtype: f
data: 0.00655919,0.000162716,0,0.00365045,0.025687,0,0.0030535,0.0152447,0,0.00135731,
1547634835 pre_ids Tensor[array_read_0.tmp_0]
shape: [1,1,]
dtype: l
LoD: [[ 0,12, ][ 0,0,0,0,0,0,0,0,0,0,1,1,1, ]]
data: 92,
1547634835 pre_state Tensor[array_read_1.tmp_0]
shape: [12,128,]
dtype: f
data: 0.00450501,-0.0319699,-0.045378,0.00643824,-0.0349801,-0.0209818,-0.0276175,0.0185534,0.00094414,-0.0382715,
1547634835 pre_score Tensor[array_read_2.tmp_0]
shape: [1,1,]
dtype: f
LoD: [[ 0,12, ][ 0,0,0,0,0,0,0,0,0,0,1,1,1, ]]
data: -3.49071,
1547634835 decoder_state_proj Tensor[fc_36.tmp_0]
shape: [12,128,]
dtype: f
data: -0.00517603,0.000552245,0.0362263,-0.049663,0.0338876,-0.0053585,0.0601997,-0.0220834,2.80392e-05,0.0360245,
1547634835 encoder_proj Tensor[fc_34.tmp_0]
shape: [768,128,]
dtype: f
LoD: [[ 0,64,128,192,256,320,384,448,512,576,640,704,768, ]]
data: 0.00266408,-0.048337,-0.0366806,0.00265633,0.0432429,0.0254773,-0.00408232,-0.0223291,-0.0024811,-0.0433478,
1547634835 context Tensor[sequence_pool_1.tmp_0]
shape: [12,256,]
dtype: f
data: 0.00655934,0.000162699,0,0.00364994,0.0256868,0,0.00305304,0.0152452,0,0.00135741,
1547634835 pre_ids Tensor[array_read_0.tmp_0]
shape: [1,1,]
dtype: l
LoD: [[ 0,1, ][ 0,1, ]]
data: 87,
1547634835 pre_state Tensor[array_read_1.tmp_0]
shape: [1,128,]
dtype: f
data: -0.0141358,0.015022,-0.0510009,0.0237546,-0.0339731,0.00935627,-0.0871612,0.0309292,0.0546338,-0.0213654,
1547634835 pre_score Tensor[array_read_2.tmp_0]
shape: [1,1,]
dtype: f
LoD: [[ 0,1, ][ 0,1, ]]
data: -7.96483,
1547634835 decoder_state_proj Tensor[fc_36.tmp_0]
shape: [1,128,]
dtype: f
data: -0.0427243,0.0284236,0.00746911,-0.0142829,0.0307404,-0.0298453,0.0334686,-0.0290551,0.0832574,0.031448,
1547634835 encoder_proj Tensor[fc_34.tmp_0]
shape: [768,128,]
dtype: f
LoD: [[ 0,64,128,192,256,320,384,448,512,576,640,704,768, ]]
data: 0.00266408,-0.048337,-0.0366806,0.00265633,0.0432429,0.0254773,-0.00408232,-0.0223291,-0.0024811,-0.0433478,
***Aborted at 1547634845 (unix time) try "date -d @1547634845" if you are using GNU date***
PC: @ 0x0 (unknown)
***SIGSEGV (@0x7f164fd73000) received by PID 20438 (TID 0x7f16a4964700) from PID 1339502592; stack trace:***
@ 0x7f16a411b160 (unknown)
@ 0x7f165b4b1637 paddle::operators::SequenceExpandFunctor<>::operator()()
@ 0x7f165b4b6068 paddle::operators::SequenceExpandKernel<>::Compute()
@ 0x7f165b4b6433 _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm0EINS0_9operators20SequenceExpandKernelINS7_16CPUDeviceContextEfEENSA_ISB_dEENSA_ISB_iEENSA_ISB_lEEEEclEPKcSI_EUlS4_E_E9_M_invokeERKSt9_Any_dataS4_
@ 0x7f165c1a7c8c paddle::framework::OperatorWithKernel::RunImpl()
@ 0x7f165c1a3dcf paddle::framework::OperatorBase::Run()
@ 0x7f165a9924f3 paddle::framework::Executor::RunPreparedContext()
@ 0x7f165be54b93 paddle::operators::WhileOp::RunImpl()
@ 0x7f165c1a3dcf paddle::framework::OperatorBase::Run()
@ 0x7f165a9924f3 paddle::framework::Executor::RunPreparedContext()
@ 0x7f165a992f20 paddle::framework::Executor::Run()
@ 0x7f165a8a76db _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL18pybind11_init_coreERNS_6moduleEEUlRNS2_9framework8ExecutorERKNS6_11ProgramDescEPNS6_5ScopeEibbE64_vIS8_SB_SD_ibbEINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESV_
@ 0x7f165a8e378e pybind11::cpp_function::dispatcher()
@ 0x7f16a443655f PyEval_EvalFrameEx
@ 0x7f16a443886d PyEval_EvalCodeEx
@ 0x7f16a44359fc PyEval_EvalFrameEx
@ 0x7f16a443886d PyEval_EvalCodeEx
@ 0x7f16a44359fc PyEval_EvalFrameEx
@ 0x7f16a443886d PyEval_EvalCodeEx
@ 0x7f16a44389a2 PyEval_EvalCode
@ 0x7f16a4461782 PyRun_FileExFlags
@ 0x7f16a4462af9 PyRun_SimpleFileExFlags
@ 0x7f16a447882d Py_Main
@ 0x7f16a3675bd5 __libc_start_main
@ 0x4007a1 (unknown)
@ 0x0 (unknown)
7条答案
按热度按时间dgsult0t1#
根据 beam search 文档,第一个时刻的
pre_ids
(即init_ids
)的 lod level 应为 2。不知道这里怎么修改比较合适?
8dtrkrch2#
因为在decoder过程中使用了beam search, init_ids需要根据动态的batch size设置level为2的lod. 如果batch size为k, 则lod需要设置为 [[i for i in range(k+1)], [i for i in range(k+1)]],这里有两个问题:
解决方法:
init_ids
设置正确的lod.z4iuyo4d3#
【最新进展】
经过实验,按以下方式设置init_ids和init_scores的lod,可以跑通预测流程:
另外,需要注解掉以下两句,因为Print op不支持TensorArray类型。。。
z3yyvxxp4#
当
rois
的shape
为[0,]
,LoD
为[[ 0,0, ]]
时,roi-perspective-transform
会报错:soat7uwm5#
ROI预测为空的bug, 采用IF-ELSE判断,待debug跟进
368yc8dk6#
解决 roi 为空预测报错的问题:
遗留问题:
问题1:报错
先测无 roi 图片,再测有 roi 图片,报错
打开保存的
__model__
发现不包含 range opxlpyo6sf7#
问题1定位问题:
使用
fluid.layers.beam_search
会报错,删掉不报错目前先删掉,待 debug
问题2定位问题:
paddle 版本问题