Paddle 训练模型时 出现 [operator < read > error]

nwwlzxa7  于 2021-11-29  发布在  Java
关注(0)|答案(7)|浏览(476)
  • 版本、环境信息:

   1)PaddlePaddle版本:PaddlePaddle 1.8.4
   2)系统环境:centos7

2020-09-10 16:26:35,460-INFO: {'Global': {'debug': False, 'algorithm': 'CRNN', 'use_gpu': False, 'epoch_num': 1000, 'log_smooth_window': 20, 'print_batch_step': 10, 'save_model_dir': './output/rec_CRNN', 'save_epoch_step': 300, 'eval_batch_step': 500, 'train_batch_size_per_card': 256, 'test_batch_size_per_card': 256, 'image_shape': [3, 32, 100], 'max_text_length': 25, 'character_type': 'ch', 'use_space_char': True, 'loss_type': 'ctc', 'distort': True, 'character_dict_path': './ppocr/utils/ic15_dict.txt', 'reader_yml': './configs/rec/rec_icdar15_reader.yml', 'pretrain_weights': './pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy', 'checkpoints': None, 'save_inference_dir': None, 'infer_img': None}, 'Architecture': {'function': 'ppocr.modeling.architectures.rec_model,RecModel'}, 'Backbone': {'function': 'ppocr.modeling.backbones.rec_mobilenet_v3,MobileNetV3', 'scale': 0.5, 'model_name': 'large'}, 'Head': {'function': 'ppocr.modeling.heads.rec_ctc_head,CTCPredict', 'encoder_type': 'rnn', 'SeqRNN': {'hidden_size': 96}}, 'Loss': {'function': 'ppocr.modeling.losses.rec_ctc_loss,CTCLoss'}, 'Optimizer': {'function': 'ppocr.optimizer,AdamDecay', 'base_lr': 0.0005, 'beta1': 0.9, 'beta2': 0.999, 'decay': {'function': 'cosine_decay', 'step_each_epoch': 20, 'total_epoch': 1000}}, 'TrainReader': {'reader_function': 'ppocr.data.rec.dataset_traversal,SimpleReader', 'num_workers': 1, 'img_set_dir': './train_data/zhengTest', 'label_file_path': './train_data/zhengTest/rec_gt_train.txt'}, 'EvalReader': {'reader_function': 'ppocr.data.rec.dataset_traversal,SimpleReader', 'img_set_dir': './train_data/zhengTest', 'label_file_path': './train_data/zhengTest/rec_gt_test.txt'}, 'TestReader': {'reader_function': 'ppocr.data.rec.dataset_traversal,SimpleReader'}}
2020-09-10 16:26:35,980-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000000] in Optimizer will not take effect, and it will only be applied to other Parameters!
2020-09-10 16:26:37,497-INFO: Distort operation can only support in GPU.Distort will be set to False.
2020-09-10 16:26:37,498-INFO: places would be ommited when DataLoader is not iterable
2020-09-10 16:26:37,498-INFO: Distort operation can only support in GPU.Distort will be set to False.
2020-09-10 16:26:37,728-INFO: Loading parameters from ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy...
2020-09-10 16:26:37,782-WARNING: variable ctc_fc_b_attr not used
2020-09-10 16:26:37,782-WARNING: variable ctc_fc_w_attr not used
2020-09-10 16:26:37,818-INFO: Finish initing model from ./pretrain_models/rec_mv3_none_bilstm_ctc/best_accuracy
!!! The CPU_NUM is not specified, you should set CPU_NUM in the environment variable list.
CPU_NUM indicates that how many CPUPlace are used in the current task.
And if this parameter are set as N (equal to the number of physical CPU core) the program may be faster.

export CPU_NUM=8 # for example, set CPU_NUM as number of physical CPU core which is 8.

!!! The default number of CPU_NUM=1.
W0910 16:26:37.854447 23655 build_strategy.cc:170] fusion_group is not enabled for Windows/MacOS now, and only effective when running with CUDA GPU.
Process Process-1:
2020-09-10 16:26:38,041-WARNING: Your reader has raised an exception!
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/python3/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(self._args,self._kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/paddle/reader/decorator.py", line 556, in _read_into_queue
six.reraise(sys.exc_info())
File "/usr/local/python3/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/python3/lib/python3.6/site-packages/paddle/reader/decorator.py", line 549, in _read_into_queue
for sample in reader():
File "/root/PaddleORC/PaddleOCR/ppocr/data/rec/dataset_traversal.py", line 324, in batch_iter_reader
for outs in sample_iter_reader():
File "/root/PaddleORC/PaddleOCR/ppocr/data/rec/dataset_traversal.py", line 286, in sample_iter_reader
self.num_workers))
Exception: The number of the whole data (8) is smaller than the batch_size * devices_num * num_workers (256)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/python3/lib/python3.6/threading.py", line 864, in run
self._target(self._args,self._kwargs)
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1145, in
thread_main

six.reraise(*sys.exc_info())
File "/usr/local/python3/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1125, in
thread_main
*
for tensors in self._tensor_reader():
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1195, intensor_reader_impl
for slots in paddle_reader():
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/data_feeder.py", line 506, inreader_creator
for item in reader():
File "/usr/local/python3/lib/python3.6/site-packages/paddle/reader/decorator.py", line 572, in queue_reader
raise ValueError("multiprocess reader raises an exception")
ValueError: multiprocess reader raises an exception

/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/train.py", line 123, in
main()
File "tools/train.py", line 100, in main
program.train_eval_rec_run(config, exe, train_info_dict, eval_info_dict)
File "/root/PaddleORC/PaddleOCR/tools/program.py", line 345, in train_eval_rec_run
return_numpy=False)
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1071, in run
six.reraise(*sys.exc_info())
File "/usr/local/python3/lib/python3.6/site-packages/six.py", line 703, in reraise
raise value
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1066, in run
return_merged=return_merged)
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl
return_merged=return_merged)
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel
tensors = exe.run(fetch_var_names, return_merged)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >*)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocatorpaddle::framework::LoDTensor >*)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Python Call Stacks (More useful to users):

File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2610, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 1080, in _init_non_iterable
attrs={'drop_last': self._drop_last})
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 978, ininit
self._init_non_iterable()
File "/usr/local/python3/lib/python3.6/site-packages/paddle/fluid/reader.py", line 620, in from_generator
iterable, return_list, drop_last)
File "/root/PaddleORC/PaddleOCR/ppocr/modeling/architectures/rec_model.py", line 135, in create_feed
iterable=False)
File "/root/PaddleORC/PaddleOCR/ppocr/modeling/architectures/rec_model.py", line 185, incall
image, labels, loader = self.create_feed(mode)
File "/root/PaddleORC/PaddleOCR/tools/program.py", line 170, in build
dataloader, outputs = model(mode=mode)
File "tools/train.py", line 50, in main
config, train_program, startup_program, mode='train')
File "tools/train.py", line 123, in
main()

Error Message Summary:

Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
[operator < read > error]

不太清楚这是什么问题

aemubtdh

aemubtdh1#

是我在训练自己的数据时发生的问题,使用官方的icdar2015 数据是没有问题的

svmlkihl

svmlkihl3#

看起来是生成的输入数据格式不对,可以先检查下生成数据的正确性

mm9b1k5b

mm9b1k5b4#

分别是train 和 test 的gt.txt 的内容

和文档中的是一致的啊

pgky5nke

pgky5nke5#

这个是路径结构

js5cn81o

js5cn81o6#

这个是配置文件内容

rwqw0loc

rwqw0loc7#

是跑哪个示例代码报错吗? 能提供下复现代码吗?

相关问题