PaddleOCR PGNet training - Slow E2E Metric calculation when number of texts per image > 100

hc8w905p  于 2022-11-05  发布在  其他
关注(0)|答案(2)|浏览(121)

My test image having size of 1200 x 1696, with number of texts per image could be maximum 500 (could be similar to one page inside a book)
I trained the PGNet, the training epoch is fine, but the eval is really slow.
I have investigated inside your code and the issue come from get_socre_A function in E2EMetric, so I guess when the size of e2e_info_list and gt_info_list is above 100, the process take really long time to complete (about 13s per page with 120 annotated text blocks, and 31s for 199 text blocks).
Maybe this function running on CPU. isn't it?
Do you have any solution for this one? Thank you.

8iwquhpp

8iwquhpp1#

My config file:

Global:
use_gpu: True
epoch_num: 600
log_smooth_window: 20
print_batch_step: 10
save_model_dir: ./output/pgnet_r50_vd_totaltext/
save_epoch_step: 10
eval_batch_step: [ 0, 1000 ]
cal_metric_during_train: False
pretrained_model:
checkpoints:
save_inference_dir:
use_visualdl: False
infer_img:
valid_set: partvgg # two mode: totaltext valid curved words, partvgg valid non-curved words
save_res_path: ./output/pgnet_r50_vd_totaltext/predicts_pgnet.txt
character_dict_path: ppocr/utils/ko_2463.txt
character_type: korean
max_text_length: 25 # the max length in seq
max_text_nums: 1000 # the max seq nums in a pic
tcl_len: 64

Architecture:
model_type: e2e
algorithm: PGNet
Transform:
Backbone:
name: ResNet
layers: 50
Neck:
name: PGFPN
Head:
name: PGHead
out_channels: 2464 # Loss.pad_num + 1

Loss:
name: PGLoss
tcl_bs: 64
max_text_length: 25 # the same as Global: max_text_length
max_text_nums: 1000 # the same as Global竊쉖ax_text_nums
pad_num: 2463 # the length of dict for pad

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0001
regularizer:
name: 'L2'
factor: 0.

PostProcess:
name: PGPostProcess
score_thresh: 0.5
mode: fast # fast or slow two ways

Metric:
name: E2EMetric
mode: A # two ways for eval, A: label from txt, B: label from gt_mat
gt_mat_dir: ./Synthetic_ko_total_text/gt # the dir of gt_mat
character_dict_path: ppocr/utils/ko_2463.txt
main_indicator: f_score_e2e

Train:
dataset:
name: PGDataSet
data_dir: /home/gridone/TextRecognitionDataGenerator/out/document_v6/train
label_file_list: [/home/gridone/TextRecognitionDataGenerator/out/document_v6/train/train.txt]
ratio_list: [1.0]
transforms:

  • DecodeImage: # load image
    img_mode: BGR
    channel_first: False
  • E2ELabelEncodeTrain:
  • IaaAugment:
    augmenter_args:
  • { 'type': Affine, 'args': { 'rotate': [-90, 0, 90, 180], 'fit_output': True, 'cval': 255 } }
  • PGProcessTrain:
    batch_size: 4 # same as loader: batch_size_per_card
    min_crop_size: 24
    min_text_size: 4
    max_text_size: 512
  • KeepKeys:
    keep_keys: [ 'images', 'tcl_maps', 'tcl_label_maps', 'border_maps','direction_maps', 'training_masks', 'label_list', 'pos_list', 'pos_mask' ] # dataloader will return list in this order
    loader:
    shuffle: True
    drop_last: True
    batch_size_per_card: 4
    num_workers: 16

Eval:
dataset:
name: PGDataSet
data_dir: /home/gridone/TextRecognitionDataGenerator/out/document_v6/test
label_file_list: [/home/gridone/TextRecognitionDataGenerator/out/document_v6/test/test.txt]
transforms:

  • DecodeImage: # load image
    img_mode: BGR
    channel_first: False
  • E2ELabelEncodeTest:
  • E2EResizeForTest:
    max_side_len: 768
  • NormalizeImage:
    scale: 1./255.
    mean: [ 0.485, 0.456, 0.406 ]
    std: [ 0.229, 0.224, 0.225 ]
    order: 'hwc'
  • ToCHWImage:
  • KeepKeys:
    keep_keys: [ 'image', 'shape', 'polys', 'texts', 'ignore_tags', 'img_id']
    loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 1 # must be 1
    num_workers: 16

OS: Ubuntu 18.04
GPU: NVIDIA Titan X

xxe27gdn

xxe27gdn2#

I think I solved it using joblib.Parallel in sigma calculation and tau calculation process :)

相关问题