PaddleOCR 采用en_PPOCRv3_rec.yml进行英文识别训练,验证集精度过低

f1tvaqid  于 2022-11-13  发布在  其他
关注(0)|答案(3)|浏览(486)

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

  • 系统环境/System Environment:docker
  • 版本号/Version:Paddle:paddleocr-release-2.5 PaddleOCR: 问题相关组件/Related components:paddle版本为2.1.0
  • 运行指令/Command Code:
  • 完整报错/Complete Error Message:无报错
  • yml文件配置如下:
    Global:
    debug: false
    use_gpu: true
    epoch_num: 100
    log_smooth_window: 20
    print_batch_step: 10
    save_model_dir: ./output/rec_model/ppocrv3_en/
    save_epoch_step: 1
    eval_batch_step: [13540, 2708]
    cal_metric_during_train: true
    pretrained_model:
    checkpoints:
    save_inference_dir:
    use_visualdl: True
    infer_img: doc/imgs_words/ch/word_1.jpg
    character_dict_path: ./rec_dataset/alphabet.txt
    max_text_length: &max_text_length 150
    infer_mode: false
    use_space_char: true
    distributed: true
    save_res_path: ./output/ppocrv3_en/predicts_ppocrv3_en.txt

Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.001
warmup_epoch: 5
regularizer:
name: L2
factor: 3.0e-05

Architecture:
model_type: rec
algorithm: SVTR
Transform:
Backbone:
name: MobileNetV1Enhance
scale: 0.5
last_conv_stride: [1, 2]
last_pool_type: avg
Head:
name: MultiHead
head_list:

  • CTCHead:
    Neck:
    name: svtr
    dims: 64
    depth: 2
    hidden_dims: 120
    use_guide: True
    Head:
    fc_decay: 0.00001
  • SARHead:
    enc_dim: 512
    max_text_length: *max_text_length

Loss:
name: MultiLoss
loss_config_list:

  • CTCLoss:
  • SARLoss:

PostProcess:
name: CTCLabelDecode

Metric:
name: RecMetric
main_indicator: acc
ignore_space: False

Train:
dataset:
name: SimpleDataSet
data_dir: ./rec_dataset/
ext_op_transform_idx: 1
label_file_list:

  • ./rec_dataset/en_train.txt
    transforms:
  • DecodeImage:
    img_mode: BGR
    channel_first: false
  • RecConAug:
    prob: 0.5
    ext_data_num: 2
    image_shape: [32, 640, 3] #[48, 320, 3]
  • RecAug:
  • MultiLabelEncode:
  • RecResizeImg:
    image_shape: [3, 32, 640] #[3, 48, 320]
  • KeepKeys:
    keep_keys:
  • image
  • label_ctc
  • label_sar
  • length
  • valid_ratio
    loader:
    shuffle: true
    batch_size_per_card: 64
    drop_last: true
    num_workers: 4
    use_shared_memory: False

Eval:
dataset:
name: SimpleDataSet
data_dir: ./rec_dataset/
label_file_list:
-. /rec_dataset/en_val.txt
transforms:

  • DecodeImage:
    img_mode: BGR
    channel_first: false
  • MultiLabelEncode:
  • RecResizeImg:
    image_shape: [3, 32, 640] #[3, 48, 320]
  • KeepKeys:
    keep_keys:
  • image
  • label_ctc
  • label_sar
  • length
  • valid_ratio
    loader:
    shuffle: false
    drop_last: false
    batch_size_per_card: 64
    num_workers: 4
    use_shared_memory: False
  • 9个epoch训练精度如下:
    [2022/07/18 09:11:28] ppocr INFO: cur metric, acc: 0.086679999982664, norm_edit_dis: 0.7975049071977204, fps: 1035.472484016947
    [2022/07/18 09:11:30] ppocr INFO: save best model is to ./output/rec_model/ppocrv3_en_0718/best_accuracy
    [2022/07/18 09:11:30] ppocr INFO: best metric, acc: 0.086679999982664, norm_edit_dis: 0.7975049071977204, fps: 1035.472484016947, best_epoch: 8
    [2022/07/18 09:11:33] ppocr INFO: save model in ./output/rec_model/ppocrv3_en_0718/latest
    [2022/07/18 09:11:35] ppocr INFO: save model in ./output/rec_model/ppocrv3_en_0718/iter_epoch_8
    [2022/07/18 09:11:46] ppocr INFO: epoch: [9/100], global_step: 21670, lr: 0.000998, acc: 0.031250, norm_edit_dis: 0.727142, CTCLoss: 61.790436, SARLoss: 0.804682, loss: 62.581017, avg_reader_cost: 1.03217 s, avg_batch_cost: 1.58362 s, avg_samples: 38.4, ips: 24.24820 samples/s, eta: 2 days, 10:54:21
    [2022/07/18 09:11:54] ppocr INFO: epoch: [9/100], global_step: 21680, lr: 0.000998, acc: 0.062500, norm_edit_dis: 0.731419, CTCLoss: 63.198486, SARLoss: 0.815200, loss: 64.027908, avg_reader_cost: 0.00131 s, avg_batch_cost: 0.73033 s, avg_samples: 64.0, ips: 87.63198 samples/s, eta: 2 days, 10:53:59
    [2022/07/18 09:12:01] ppocr INFO: epoch: [9/100], global_step: 21690, lr: 0.000998, acc: 0.062500, norm_edit_dis: 0.727574, CTCLoss: 62.317360, SARLoss: 0.813799, loss: 63.136574, avg_reader_cost: 0.00041 s, avg_batch_cost: 0.73126 s, avg_samples: 64.0, ips: 87.51965 samples/s, eta: 2 days, 10:53:36
    [2022/07/18 09:12:08] ppocr INFO: epoch: [9/100], global_step: 21700, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.725965, CTCLoss: 61.712959, SARLoss: 0.790435, loss: 62.543045, avg_reader_cost: 0.00055 s, avg_batch_cost: 0.73522 s, avg_samples: 64.0, ips: 87.04859 samples/s, eta: 2 days, 10:53:14
    [2022/07/18 09:12:16] ppocr INFO: epoch: [9/100], global_step: 21710, lr: 0.000998, acc: 0.039062, norm_edit_dis: 0.730681, CTCLoss: 60.901382, SARLoss: 0.780803, loss: 61.697411, avg_reader_cost: 0.00055 s, avg_batch_cost: 0.72823 s, avg_samples: 64.0, ips: 87.88444 samples/s, eta: 2 days, 10:52:52
    [2022/07/18 09:12:23] ppocr INFO: epoch: [9/100], global_step: 21720, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.735047, CTCLoss: 58.004299, SARLoss: 0.789746, loss: 58.785194, avg_reader_cost: 0.00139 s, avg_batch_cost: 0.73808 s, avg_samples: 64.0, ips: 86.71122 samples/s, eta: 2 days, 10:52:30
    [2022/07/18 09:12:30] ppocr INFO: epoch: [9/100], global_step: 21730, lr: 0.000998, acc: 0.046875, norm_edit_dis: 0.736219, CTCLoss: 58.464737, SARLoss: 0.791309, loss: 59.240974, avg_reader_cost: 0.00053 s, avg_batch_cost: 0.74419 s, avg_samples: 64.0, ips: 85.99914 samples/s, eta: 2 days, 10:52:10
    [2022/07/18 09:12:38] ppocr INFO: epoch: [9/100], global_step: 21740, lr: 0.000998, acc: 0.031250, norm_edit_dis: 0.732627, CTCLoss: 62.068687, SARLoss: 0.780804, loss: 62.813091, avg_reader_cost: 0.00057 s, avg_batch_cost: 0.77975 s, avg_samples: 64.0, ips: 82.07759 samples/s, eta: 2 days, 10:51:53

经过9个epoch的训练,准确率不到0.1,字符编辑距离都0.7了,总感觉有问题,相比较CRNN训练,这个训练差太多,请问配置文件设置错误了吗?还是需要修改其他什么?

qojgxg4l

qojgxg4l1#

CTC loss波动比较大,可能的问题是目前的shape=[3, 32, 640], GTC策略中CTC分支单独优化,梯度不回传,过长文本场景可能不适用。 建议去除GTC策略,单独使用 LCNet_SVTR 进行训练。可以参考这个issue修改配置文件: #6355

pgpifvop

pgpifvop2#

CTC loss波动比较大,可能的问题是目前的shape=[3, 32, 640], GTC策略中CTC分支单独优化,梯度不回传,过长文本场景可能不适用。 建议去除GTC策略,单独使用 LCNet_SVTR 进行训练。可以参考这个issue修改配置文件: #6355

en_PP-OCRv3_rec.yml配置文件部分修改为:
Neck:
name: SequenceEncoder
encoder_type: svtr
dims: 64
depth: 2
hidden_dims: 120
use_guide: False
Head:
name: CTCHead
fc_decay: 0.00001

Loss:
name: CTCLoss

经过75个epoch的训练,精度如下:
[2022/07/21 06:28:15] ppocr INFO: cur metric, acc: 0.572419999885516, norm_edit_dis: 0.9788205213278756, fps: 952.368972133215
[2022/07/21 06:28:15] ppocr INFO: best metric, acc: 0.578679999884264, norm_edit_dis: 0.978377395575744, fps: 1000.5441859767002, best_epoch: 69
[2022/07/21 06:28:17] ppocr INFO: save model in ./output/rec_model_2022/ppocrv3_en_0720/latest
[2022/07/21 06:28:17] ppocr INFO: save model in ./output/rec_model_2022/ppocrv3_en_0720/iter_epoch_74
[2022/07/21 06:28:45] ppocr INFO: epoch: [75/500], global_step: 100200, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.939163, loss: 12.896205, avg_reader_cost: 2.62400 s, avg_batch_cost: 3.03456 s, avg_samples: 51.2, ips: 16.87232 samples/s, eta: 5 days, 11:19:27
[2022/07/21 06:28:52] ppocr INFO: epoch: [75/500], global_step: 100210, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.937355, loss: 12.740088, avg_reader_cost: 0.00044 s, avg_batch_cost: 0.67201 s, avg_samples: 128.0, ips: 190.47299 samples/s, eta: 5 days, 11:19:10
[2022/07/21 06:28:58] ppocr INFO: epoch: [75/500], global_step: 100220, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.937217, loss: 13.312311, avg_reader_cost: 0.00068 s, avg_batch_cost: 0.55925 s, avg_samples: 128.0, ips: 228.87754 samples/s, eta: 5 days, 11:18:47
[2022/07/21 06:29:04] ppocr INFO: epoch: [75/500], global_step: 100230, lr: 0.000954, acc: 0.343750, norm_edit_dis: 0.939586, loss: 13.312311, avg_reader_cost: 0.00061 s, avg_batch_cost: 0.59037 s, avg_samples: 128.0, ips: 216.81468 samples/s, eta: 5 days, 11:18:25
[2022/07/21 06:29:10] ppocr INFO: epoch: [75/500], global_step: 100240, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.939396, loss: 13.581049, avg_reader_cost: 0.07561 s, avg_batch_cost: 0.65824 s, avg_samples: 128.0, ips: 194.45798 samples/s, eta: 5 days, 11:18:08
[2022/07/21 06:29:19] ppocr INFO: epoch: [75/500], global_step: 100250, lr: 0.000954, acc: 0.355469, norm_edit_dis: 0.938832, loss: 13.796519, avg_reader_cost: 0.27817 s, avg_batch_cost: 0.86297 s, avg_samples: 128.0, ips: 148.32440 samples/s, eta: 5 days, 11:18:02
[2022/07/21 06:29:25] ppocr INFO: epoch: [75/500], global_step: 100260, lr: 0.000954, acc: 0.351562, norm_edit_dis: 0.940922, loss: 12.944593, avg_reader_cost: 0.00222 s, avg_batch_cost: 0.56770 s, avg_samples: 128.0, ips: 225.47009 samples/s, eta: 5 days, 11:17:40
[2022/07/21 06:29:37] ppocr INFO: epoch: [75/500], global_step: 100270, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.936109, loss: 13.633232, avg_reader_cost: 0.63327 s, avg_batch_cost: 1.27472 s, avg_samples: 128.0, ips: 100.41385 samples/s, eta: 5 days, 11:17:58
[2022/07/21 06:29:48] ppocr INFO: epoch: [75/500], global_step: 100280, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.933838, loss: 14.711888, avg_reader_cost: 0.51032 s, avg_batch_cost: 1.09601 s, avg_samples: 128.0, ips: 116.78701 samples/s, eta: 5 days, 11:18:05
[2022/07/21 06:29:54] ppocr INFO: epoch: [75/500], global_step: 100290, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.933997, loss: 14.596161, avg_reader_cost: 0.00083 s, avg_batch_cost: 0.59570 s, avg_samples: 128.0, ips: 214.87346 samples/s, eta: 5 days, 11:17:44
[2022/07/21 06:30:06] ppocr INFO: epoch: [75/500], global_step: 100300, lr: 0.000954, acc: 0.339844, norm_edit_dis: 0.937798, loss: 13.345118, avg_reader_cost: 0.53199 s, avg_batch_cost: 1.18358 s, avg_samples: 128.0, ips: 108.14639 samples/s, eta: 5 days, 11:17:57
[2022/07/21 06:30:12] ppocr INFO: epoch: [75/500], global_step: 100310, lr: 0.000954, acc: 0.343750, norm_edit_dis: 0.941131, loss: 12.990923, avg_reader_cost: 0.00068 s, avg_batch_cost: 0.57355 s, avg_samples: 128.0, ips: 223.17010 samples/s, eta: 5 days, 11:17:35
[2022/07/21 06:30:22] ppocr INFO: epoch: [75/500], global_step: 100320, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.934477, loss: 13.830566, avg_reader_cost: 0.46184 s, avg_batch_cost: 1.04353 s, avg_samples: 128.0, ips: 122.66095 samples/s, eta: 5 days, 11:17:39
[2022/07/21 06:30:32] ppocr INFO: epoch: [75/500], global_step: 100330, lr: 0.000954, acc: 0.332031, norm_edit_dis: 0.937619, loss: 13.486568, avg_reader_cost: 0.35058 s, avg_batch_cost: 0.92052 s, avg_samples: 128.0, ips: 139.05131 samples/s, eta: 5 days, 11:17:37
[2022/07/21 06:30:37] ppocr INFO: epoch: [75/500], global_step: 100340, lr: 0.000954, acc: 0.375000, norm_edit_dis: 0.939735, loss: 11.924157, avg_reader_cost: 0.00058 s, avg_batch_cost: 0.58131 s, avg_samples: 128.0, ips: 220.19261 samples/s, eta: 5 days, 11:17:15
[2022/07/21 06:30:47] ppocr INFO: epoch: [75/500], global_step: 100350, lr: 0.000954, acc: 0.367187, norm_edit_dis: 0.937912, loss: 12.060099, avg_reader_cost: 0.32633 s, avg_batch_cost: 0.91873 s, avg_samples: 128.0, ips: 139.32223 samples/s, eta: 5 days, 11:17:12
[2022/07/21 06:30:58] ppocr INFO: epoch: [75/500], global_step: 100360, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.936123, loss: 13.900740, avg_reader_cost: 0.38886 s, avg_batch_cost: 1.11041 s, avg_samples: 128.0, ips: 115.27266 samples/s, eta: 5 days, 11:17:21
[2022/07/21 06:31:04] ppocr INFO: epoch: [75/500], global_step: 100370, lr: 0.000954, acc: 0.347656, norm_edit_dis: 0.940024, loss: 12.774862, avg_reader_cost: 0.00087 s, avg_batch_cost: 0.58649 s, avg_samples: 128.0, ips: 218.24852 samples/s, eta: 5 days, 11:16:59
.......
此时的验证集精度acc: 0.572419999885516, norm_edit_dis: 0.9788205213278756,然而训练集准确率0.3,字符编辑距离0.93,问题1:验证集精度高于实时训练精度是否正常?还是训练远远未达到收敛?问题2:RecResizeImg:image_shape: [3, 32, 640] ,图像高度是否不能设置为640,还是只能设置到320?问题3:max_text_length:150,文本长度过长,是否会影响识别性能?

b09cbbtk

b09cbbtk3#

建议减少max_text_length,精度会提高。具体多少的还是试吧,和炼丹一样。我试过中文长度为12效果最好,英文的没试过。

相关问题