使用rnn+crf做序列标注任务时,训练过程中出现了错误 Enforce failed. Expected begin_idx < end_idx, but received begin_idx:19 >= end_idx:19. 这个应该怎么解决,辛苦
paddle版本1.3.0
单机cpu训练
使用到的与问题有关的层 fluid.layers.crf_decoding 、fluid.layers.linear_chain_crf、 fluid.layers.dynamic_gru
运行过程中的日志信息与报错信息如下:
2019-09-12 11:04:25,332 - diaml.py - 261 - INFO - ner_embs: (-1L, 33L, 128L)
2019-09-12 11:04:25,332 - INFO - ner_embs: (-1L, 33L, 128L)
2019-09-12 11:04:25,334 - diaml.py - 274 - INFO - word_emb: (-1L, 128L)
2019-09-12 11:04:25,334 - INFO - word_emb: (-1L, 128L)
2019-09-12 11:04:25,335 - diaml.py - 276 - INFO - word_emb_concat: (-1L, 1L, 128L)
2019-09-12 11:04:25,335 - INFO - word_emb_concat: (-1L, 1L, 128L)
2019-09-12 11:04:25,336 - diaml.py - 278 - INFO - ner_embs_concat: (-1L, 34L, 128L)
2019-09-12 11:04:25,336 - INFO - ner_embs_concat: (-1L, 34L, 128L)
2019-09-12 11:04:25,337 - diaml.py - 289 - INFO - tar_emb: (-1L, 128L)
2019-09-12 11:04:25,337 - INFO - tar_emb: (-1L, 128L)
2019-09-12 11:04:25,338 - diaml.py - 291 - INFO - tar_emb_concat: (-1L, 1L, 128L)
2019-09-12 11:04:25,338 - INFO - tar_emb_concat: (-1L, 1L, 128L)
2019-09-12 11:04:25,339 - diaml.py - 297 - INFO - xy: (-1L, 1L, 34L)
2019-09-12 11:04:25,339 - INFO - xy: (-1L, 1L, 34L)
2019-09-12 11:04:25,341 - diaml.py - 299 - INFO - weights: (-1L, 1L, 34L)
2019-09-12 11:04:25,341 - INFO - weights: (-1L, 1L, 34L)
2019-09-12 11:04:25,342 - diaml.py - 301 - INFO - ners_t: (-1L, 1L, 128L)
2019-09-12 11:04:25,342 - INFO - ners_t: (-1L, 1L, 128L)
2019-09-12 11:04:25,343 - diaml.py - 303 - INFO - ners_in: (-1L, 128L)
2019-09-12 11:04:25,343 - INFO - ners_in: (-1L, 128L)
2019-09-12 11:04:25,344 - diaml.py - 314 - INFO - val_emb: (-1L, 16L)
2019-09-12 11:04:25,344 - INFO - val_emb: (-1L, 16L)
2019-09-12 11:04:25,345 - diaml.py - 316 - INFO - input_feature: (-1L, 144L)
2019-09-12 11:04:25,345 - INFO - input_feature: (-1L, 144L)
2019-09-12 11:04:28,937 - diaml.py - 480 - INFO - epoch 0, batch 3, loss 11.788962
2019-09-12 11:04:28,937 - INFO - epoch 0, batch 3, loss 11.788962
2019-09-12 11:04:29,504 - diaml.py - 480 - INFO - epoch 0, batch 43, loss 3.989214
2019-09-12 11:04:29,504 - INFO - epoch 0, batch 43, loss 3.989214
Traceback (most recent call last):
File "diaml.py", line 726, in <module>
ml.process()
File "diaml.py", line 698, in process
self.train()
File "diaml.py", line 471, in train
fetch_list=[avg_cost.name, intent_acc.name, num_infer_chunks.name, num_label_chunks.name, num_correct_chunks.name])
File "/home/work/.jumbo/lib/python2.7/site-packages/paddle/fluid/parallel_executor.py", line 303, in run
self.executor.run(fetch_list, fetch_var_name)
paddle.fluid.core.EnforceNotMet: Invoke operator crf_decoding error.
Python Callstacks:
File "/home/work/.jumbo/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1317, in append_op
attrs=kwargs.get("attrs", None))
File "/home/work/.jumbo/lib/python2.7/site-packages/paddle/fluid/layer_helper.py", line 56, in append_op
return self.main_program.current_block().append_op(*args,**kwargs)
File "/home/work/.jumbo/lib/python2.7/site-packages/paddle/fluid/layers/nn.py", line 1222, in crf_decoding
outputs={"ViterbiPath": [viterbi_path]})
File "diaml.py", line 346, in _net_conf
input=emission, param_attr=fluid.ParamAttr(name='crfw'))
File "diaml.py", line 387, in net_work
avg_cost, crf_decode, intent_prediction, emission = _net_conf(word, ners, act_val, act_tar, slot_label, intent_label)
File "diaml.py", line 434, in train
avg_cost, [crf_decode, intent_pred, emission], [word, ner, act_val, act_tar, slot_label, intent_label] = self.net_work()
File "diaml.py", line 698, in process
self.train()
File "diaml.py", line 726, in <module>
ml.process()
C++ Callstacks:
Enforce failed. Expected begin_idx < end_idx, but received begin_idx:19 >= end_idx:19.
The start row index must be lesser than the end row index. at [/paddle/paddle/fluid/framework/tensor.cc:80]
PaddlePaddle Call Stacks:
0 0x7f00c3750e2dp void paddle::platform::EnforceNotMet::Init<std::string>(std::string, char const*, int) + 365
1 0x7f00c3751177p paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int) + 87
2 0x7f00c51c090bp paddle::framework::Tensor::Slice(int, int) const + 3595
3 0x7f00c3b613e4p paddle::operators::CRFDecodingOpKernel<paddle::platform::CPUDeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const + 1028
4 0x7f00c3b61be3p _ZNSt17_Function_handlerIFvRKN6paddle9framework16ExecutionContextEEZNKS1_24OpKernelRegistrarFunctorINS0_8platform8CPUPlaceELb0ELm0EJNS0_9operators19CRFDecodingOpKernelINS7_16CPUDeviceContextEfEENSA_ISB_dEEEEclEPKcSG_iEUlS4_E_E9_M_invokeERKSt9_Any_dataS4_ + 35
5 0x7f00c5161bb3p paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 659
6 0x7f00c515f425p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7 0x7f00c4fd5a89p
8 0x7f00c4fcdf31p paddle::framework::details::OpHandleBase::RunAndRecordEvent(std::function<void ()()> const&) + 769
9 0x7f00c4fd571cp paddle::framework::details::ComputationOpHandle::RunImpl() + 124
10 0x7f00c4fcee76p paddle::framework::details::OpHandleBase::Run(bool) + 118
11 0x7f00c4f6708dp
12 0x7f00c4327283p std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()(), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&) + 35
13 0x7f00c42ea5a7p std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()()>&, bool&) + 39
14 0x318b20cb23p pthread_once + 83
15 0x7f00c4f65d72p
16 0x7f00c42eb9d4p _ZZN10ThreadPoolC1EmENKUlvE_clEv + 404
17 0x318eab6470p
18 0x318b207851p
19 0x318aee767dp clone + 109
2条答案
按热度按时间vsikbqxv1#
目前根据日志来看,已经跑了几个loss结果出来,看来反向正向在前几个loss的计算过程应该没有问题。
有可能是数据问题。就是后面的batch对应的数据不对。可以检查一下是到第几个batch 对应的是那些数据有这个问题,是否是数据的问题维度导致slice挂掉的
esyap4oy2#
我将batch数据打印出来了,没有发现不对的地方。向量维度应该是定义在模型当中的,batch中只提供index,所以前面batch能跑通,后面应该也能跑通。序列长度应该是可变的,batch中的输入与输出的序列长度是一样的,长度对齐上应该不存在问题。 所以,能否说一下还有哪些可能会导致这个错误。