PaddleOCR PGNet训练不成功是什么原因呢?谁能帮忙看看,问题都写在里面啦

jmo0nnb3  于 2022-10-21  发布在  其他
关注(0)|答案(2)|浏览(258)

[2022/05/23 17:01:47] ppocr INFO: save best model is to ./output/pgnet_r50_vd_totaltext/best_accuracy
471 [2022/05/23 17:01:47] ppocr INFO: best metric, f_score_e2e: 0, total_num_gt: 2798, total_num_det: 0, global_accumulative_r ecall: 0, hit_str_count: 0, recall: 0.0, precision: 0, f_score: 0, seqerr: 1, recall_e2e: 0.0, precision_e2e: 0, fps: 8.60 4433279775579, best_epoch: 21
472 [2022/05/23 17:01:50] ppocr ERROR: When parsing line 576, error happened with msg: list index out of range
473 [2022/05/23 17:01:57] ppocr INFO: epoch: [21/200], global_step: 2010, lr: 0.001000, loss: 1.278212, score_loss: 0.999998, border_loss: 0.140085, direction_loss: 0.111812, ctc_loss: 0.000000, avg_reader_cost: 0.53334 s, avg_batch_cost: 0.98982 s , avg_samples: 16.0, ips: 16.16456 samples/s, eta: 8:47:38

评估的时候所有指标都是0,

802 ^Meval model:: 0%| | 0/246 [00:00<?, ?it/s]Traceback (most recent call last):
803 File "tools/train.py", line 188, in
804 main(config, device, logger, vdl_writer)
805 File "tools/train.py", line 161, in main
806 program.train(config, train_dataloader, valid_dataloader, device, model,
807 File "/data/ocr/PaddleOCR-release-2.5/tools/program.py", line 339, in train
808 cur_metric = eval(
809 File "/data/ocr/PaddleOCR-release-2.5/tools/program.py", line 465, in eval
810 post_result = post_process_class(preds, batch_numpy[1])
811 File "/data/ocr/PaddleOCR-release-2.5/ppocr/postprocess/pg_postprocess.py", line 49, incall
812 data = post.pg_postprocess_fast()
813 File "/data/ocr/PaddleOCR-release-2.5/ppocr/utils/e2e_utils/pgnet_pp_utils.py", line 56, in pg_postprocess_fast
814 instance_yxs_list, seq_strs = generate_pivot_list_fast(
815 File "/data/ocr/PaddleOCR-release-2.5/ppocr/utils/e2e_utils/extract_textpoint_fast.py", line 381, in generate_pivot_list _fast
816 pos_list_sorted = sort_and_expand_with_direction_v2(
817 File "/data/ocr/PaddleOCR-release-2.5/ppocr/utils/e2e_utils/extract_textpoint_fast.py", line 241, in sort_and_expand_wit h_direction_v2
818 int((left_average_len + right_average_len) / 2.0 * 0.15), 1)
819 OverflowError: cannot convert float infinity to integer
820 ^Meval model:: 0%| | 0/246 [00:01<?, ?it/s]terminate called without an active exception
821
822
823 --------------------------------------
824 C++ Traceback (most recent call last):
825 --------------------------------------
826 No stack trace in paddle, may be caused by external reasons.
827
828 ----------------------
829 Error Message Summary:
830 ----------------------
831 FatalError: Process abort signal is detected by the operating system.
832 [TimeInfo:Aborted at 1653299690 (unix time) try "date -d @1653299690" if you are using GNU date]
833 [SignalInfo:SIGABRT (@0xb806) received by PID 47110 (TID 0x7fb2e1008700) from PID 47110]
834

**第二次评估还报错了,ctc_loss都是0.

数据集是我自己标注的,标签如下**

SXB_train/img_001.jpg [{"transcription": "料:100%聚酯纤维", "points": [[157, 618], [498, 583], [502, 623], [161, 658]], "difficult": false}, {"transcription": "填充料:100%聚酯纤维", "points": [[165, 665], [499, 633], [503, 674], [169, 705]], "difficult": false}, {"transcription": "填充量:1000g", "points": [[171, 715], [398, 698], [401, 735], [174, 753]], "difficult": false}, {"transcription": "执行标准:GB/T22796-2009", "points": [[175, 769], [549, 725], [554, 762], [179, 805]], "difficult": false}, {"transcription": "安全类别:GB18401-2010C类", "points": [[177, 820], [579, 769], [585, 809], [182, 860]], "difficult": false}, {"transcription": "品名:被芯", "points": [[153, 506], [366, 491], [371, 535], [155, 555]], "difficult": false}, {"transcription": "规格:150x200cm", "points": [[154, 567], [466, 538], [471, 576], [159, 607]], "difficult": false}]

配置文件如下:
1 Global:
2 use_gpu: True
3 epoch_num: 200
4 log_smooth_window: 20
5 print_batch_step: 10
6 save_model_dir: ./output/pgnet_r50_vd_totaltext/
7 save_epoch_step: 500
8 # evaluation is run every 0 iterationss after the 1000th iteration
9 eval_batch_step: [ 0, 2000 ]
10 cal_metric_during_train: False
11 pretrained_model:
12 checkpoints:
13 save_inference_dir:
14 use_visualdl: False
15 infer_img:
16 valid_set: partvgg # two mode: totaltext valid curved words, partvgg valid non-curved words
17 save_res_path: ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt
18 character_dict_path: ppocr/utils/ppocr_keys_v1.txt
19 character_type: CH
20 max_text_length: 50 # the max length in seq
21 max_text_nums: 30 # the max seq nums in a pic
22 tcl_len: 64
23
24 Architecture:
25 model_type: e2e
26 algorithm: PGNet
27 Transform:
28 Backbone:
29 name: ResNet
30 layers: 50
31 Neck:
32 name: PGFPN
33 Head:
34 name: PGHead
36 Loss:
37 name: PGLoss
38 tcl_bs: 64
39 max_text_length: 50 # the same as Global: max_text_length
40 max_text_nums: 30 # the same as Global:max_text_nums
41 pad_num: 36 # the length of dict for pad
42
43 Optimizer:
44 name: Adam
45 beta1: 0.9
46 beta2: 0.999
47 lr:
48 learning_rate: 0.001
49 regularizer:
50 name: 'L2'
51 factor: 0
66 Train:
67 dataset:
68 name: PGDataSet
69 data_dir: ./train_data/
70 label_file_list: [./train_data/train.txt]
71 ratio_list: [1.0]
72 transforms:
73 - DecodeImage: # load image
74 img_mode: BGR
75 channel_first: False
76 - E2ELabelEncodeTrain:
77 - PGProcessTrain:
78 batch_size: 14 # same as loader: batch_size_per_card
79 min_crop_size: 24
80 min_text_size: 4
81 max_text_size: 512
82 - KeepKeys:
83 keep_keys: [ 'images', 'tcl_maps', 'tcl_label_maps', 'border_maps','direction_maps', 'training_masks', 'label_li st', 'pos_list', 'pos_mask' ] # dataloader will return list in this order
84 loader:
85 shuffle: True
86 drop_last: True
87 batch_size_per_card: 16
88 num_workers: 0
90 Eval:
91 dataset:
92 name: PGDataSet
93 data_dir: ./train_data/
94 label_file_list: [./train_data/test.txt]
95 transforms:
96 - DecodeImage: # load image
97 img_mode: BGR
98 channel_first: False
99 - E2ELabelEncodeTest:
100 - E2EResizeForTest:
101 max_side_len: 768
102 - NormalizeImage:
103 scale: 1./255.
104 mean: [ 0.485, 0.456, 0.406 ]
105 std: [ 0.229, 0.224, 0.225 ]
106 order: 'hwc'
107 - ToCHWImage:
108 - KeepKeys:
109 keep_keys: [ 'image', 'shape', 'polys', 'texts', 'ignore_tags', 'img_id']
110 loader:
111 shuffle: False
112 drop_last: False
113 batch_size_per_card: 1 # must be 1
114 num_workers: 0

osh3o9ms

osh3o9ms1#

818 int((left_average_len + right_average_len) / 2.0 * 0.15), 1)
819 OverflowError: cannot convert float infinity to integer

检查下你这里的数据
PaddleOCR/ppocr/utils/e2e_utils/extract_textpoint_fast.py

Line 240 in d8a8ca8

| | append_num=max( |

是不是出现了 infinity 数据

gfttwv5a

gfttwv5a2#

818 int((left_average_len + right_average_len) / 2.0 * 0.15), 1) 819 OverflowError: cannot convert float infinity to integer

检查下你这里的数据

PaddleOCR/ppocr/utils/e2e_utils/extract_textpoint_fast.py

Line 240 in d8a8ca8

| | append_num=max( |

是不是出现了 infinity 数据

[2022/05/25 15:50:39] ppocr INFO: best metric, f_score_e2e: 0, total_num_gt: 2798, total_num_det: 2244, global_accumulati ve_recall: 1277.3999999999976, hit_str_count: 0, recall: 0.45654038598999197, precision: 0.5870766488413544, f_score: 0.5 136447392525687, seqerr: 1.0, recall_e2e: 0.0, precision_e2e: 0.0, fps: 8.88808927959238, best_epoch: 84
3923 [2022/05/25 15:50:43] ppocr INFO: epoch: [84/200], global_step: 2010, lr: 0.001000, loss: 0.183176, score_loss: 0.125891, border_loss: 0.032234, direction_loss: 0.028453, ctc_loss: 0.000000, avg_reader_cost: 0.00045 s, avg_batch_cost: 0.40427 s, avg_samples: 4.0, ips: 9.89444 samples/s, eta: 0:22:12
为什么e2e的指标全是0呢?训练的时候ctc_loss也是0

相关问题