Merge remote-tracking branch 'origin/dygraph' into dy1

Evezerest · Oct 22, 2022 · a834978 · a834978
2 parents 8deb872 + a50c731
commit a834978
Show file tree

Hide file tree

Showing 145 changed files with 10,066 additions and 714 deletions.
diff --git a/README.md b/README.md
@@ -27,11 +27,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools
 
 ## Recent updates
 - **🔥2022.8.24 Release PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- - Release [PP-Structurev2](./ppstructure/)，with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**;
+ - Release [PP-StructureV2](./ppstructure/)，with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**;
  - [Layout Analysis](./ppstructure/layout) optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms;
  - [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
  - [Key Information Extraction](./ppstructure/kie) optimization：a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.
- 
+
 - **🔥2022.7 Release [OCR scene application collection](./applications/README_en.md)**
  - Release **9 vertical models** such as digital tube, LCD screen, license plate, handwriting recognition model, high-precision SVTR model, etc, covering the main OCR vertical applications in general, manufacturing, finance, and transportation industries.
 
@@ -129,7 +129,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
  - [Text recognition](./doc/doc_en/algorithm_overview_en.md)
  - [End-to-end OCR](./doc/doc_en/algorithm_overview_en.md)
  - [Table Recognition](./doc/doc_en/algorithm_overview_en.md)
- - [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md)  
+ - [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md) 
  - [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
 - Data Annotation and Synthesis
  - [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
@@ -181,7 +181,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
 </details>
 
 <details open>
-<summary>PP-Structurev2</summary>
+<summary>PP-StructureV2</summary>
 
 - layout analysis + table recognition 
 <div align="center">
@@ -192,7 +192,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
 <div align="center">
  <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
 </div>
- 
+
 <div align="center">
  <img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
 </div>
@@ -204,7 +204,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
 - RE (Relation Extraction)
 <div align="center">
  <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
-</div>  
+</div> 
 
 <div align="center">
  <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">

diff --git a/README_ch.md b/README_ch.md
@@ -28,14 +28,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 ## 近期更新
 
 - **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- - 发布[PP-Structurev2](./ppstructure/)，系统功能性能全面升级，适配中文场景，新增支持[版面复原](./ppstructure/recovery)，支持**一行命令完成PDF转Word**；
+ - 发布[PP-StructureV2](./ppstructure/)，系统功能性能全面升级，适配中文场景，新增支持[版面复原](./ppstructure/recovery)，支持**一行命令完成PDF转Word**；
  - [版面分析](./ppstructure/layout)模型优化：模型存储减少95%，速度提升11倍，平均CPU耗时仅需41ms；
  - [表格识别](./ppstructure/table)模型优化：设计3大优化策略，预测耗时不变情况下，模型精度提升6%；
  - [关键信息抽取](./ppstructure/kie)模型优化：设计视觉无关模型结构，语义实体识别精度提升2.8%，关系抽取精度提升9.1%。
- 
+
 - **🔥2022.8 发布 [OCR场景应用集合](./applications)**
  - 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**，覆盖通用，制造、金融、交通行业的主要OCR垂类应用。
- 
+
 - **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
  - 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3)，速度可比情况下，中文场景效果相比于PP-OCRv2再提升5%，英文场景提升11%，80语种多语言模型平均识别准确率提升5%以上；
  - 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel)：新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能；
@@ -220,11 +220,11 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 <div align="center">
  <img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
 </div>
- 
+
 <div align="center">
  <img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
 </div>
- 
+
 - RE（关系提取）
 <div align="center">
  <img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
@@ -237,7 +237,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库，助力
 <div align="center">
  <img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
 </div>
- 
+
 </details>
 
 <a name="许可证书"></a>

diff --git a/configs/det/det_mv3_db.yml b/configs/det/det_mv3_db.yml
@@ -1,6 +1,7 @@
 Global:
  use_gpu: true
  use_xpu: false
+ use_mlu: false
  epoch_num: 1200
  log_smooth_window: 20
  print_batch_step: 10

diff --git a/configs/det/det_r50_db++_icdar15.yml b/configs/det/det_r50_db++_icdar15.yml
@@ -54,6 +54,7 @@ PostProcess:
  box_thresh: 0.6
  max_candidates: 1000
  unclip_ratio: 1.5
+ det_box_type: 'quad' # 'quad' or 'poly'
 Metric:
  name: DetMetric
  main_indicator: hmean

diff --git a/configs/det/det_r50_db++_td_tr.yml b/configs/det/det_r50_db++_td_tr.yml
@@ -54,6 +54,7 @@ PostProcess:
  box_thresh: 0.5
  max_candidates: 1000
  unclip_ratio: 1.5
+ det_box_type: 'quad' # 'quad' or 'poly'
 Metric:
  name: DetMetric
  main_indicator: hmean

diff --git a/configs/det/det_r50_drrg_ctw.yml b/configs/det/det_r50_drrg_ctw.yml
@@ -0,0 +1,133 @@
+Global:
+ use_gpu: true
+ epoch_num: 1200
+ log_smooth_window: 20
+ print_batch_step: 5
+ save_model_dir: ./output/det_r50_drrg_ctw/
+ save_epoch_step: 100
+ # evaluation is run every 1260 iterations
+ eval_batch_step: [37800, 1260]
+ cal_metric_during_train: False
+ pretrained_model: ./pretrain_models/ResNet50_vd_ssld_pretrained.pdparams 
+ checkpoints: 
+ save_inference_dir: 
+ use_visualdl: False
+ infer_img: doc/imgs_en/img_10.jpg
+ save_res_path: ./output/det_drrg/predicts_drrg.txt
+
+
+Architecture:
+ model_type: det
+ algorithm: DRRG
+ Transform:
+ Backbone:
+ name: ResNet_vd
+ layers: 50
+ Neck:
+ name: FPN_UNet
+ in_channels: [256, 512, 1024, 2048]
+ out_channels: 32
+ Head:
+ name: DRRGHead
+ in_channels: 32
+ text_region_thr: 0.3
+ center_region_thr: 0.4
+Loss:
+ name: DRRGLoss
+
+Optimizer:
+ name: Momentum
+ momentum: 0.9
+ lr:
+ name: DecayLearningRate
+ learning_rate: 0.028
+ epochs: 1200
+ factor: 0.9
+ end_lr: 0.0000001
+ weight_decay: 0.0001
+
+PostProcess:
+ name: DRRGPostprocess
+ link_thr: 0.8
+
+Metric:
+ name: DetFCEMetric
+ main_indicator: hmean
+
+Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/ctw1500/imgs/
+ label_file_list: 
+ - ./train_data/ctw1500/imgs/training.txt
+ transforms:
+ - DecodeImage: # load image
+ img_mode: BGR
+ channel_first: False
+ ignore_orientation: True
+ - DetLabelEncode: # Class handling label
+ - ColorJitter: 
+ brightness: 0.12549019607843137
+ saturation: 0.5
+ - RandomScaling: 
+ - RandomCropFlip:
+ crop_ratio: 0.5
+ - RandomCropPolyInstances:
+ crop_ratio: 0.8
+ min_side_ratio: 0.3
+ - RandomRotatePolyInstances:
+ rotate_ratio: 0.5
+ max_angle: 60
+ pad_with_fixed_color: False
+ - SquareResizePad:
+ target_size: 800
+ pad_ratio: 0.6
+ - IaaAugment:
+ augmenter_args:
+ - { 'type': Fliplr, 'args': { 'p': 0.5 } }
+ - DRRGTargets:
+ - NormalizeImage:
+ scale: 1./255.
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: 'hwc'
+ - ToCHWImage:
+ - KeepKeys:
+ keep_keys: ['image', 'gt_text_mask', 'gt_center_region_mask', 'gt_mask',
+ 'gt_top_height_map', 'gt_bot_height_map', 'gt_sin_map',
+ 'gt_cos_map', 'gt_comp_attribs'] # dataloader will return list in this order
+ loader:
+ shuffle: True
+ drop_last: False
+ batch_size_per_card: 4
+ num_workers: 8
+
+Eval:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/ctw1500/imgs/
+ label_file_list:
+ - ./train_data/ctw1500/imgs/test.txt
+ transforms:
+ - DecodeImage: # load image
+ img_mode: BGR
+ channel_first: False
+ ignore_orientation: True
+ - DetLabelEncode: # Class handling label
+ - DetResizeForTest:
+ limit_type: 'min'
+ limit_side_len: 640
+ - NormalizeImage:
+ scale: 1./255.
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: 'hwc'
+ - Pad: 
+ - ToCHWImage:
+ - KeepKeys:
+ keep_keys: ['image', 'shape', 'polys', 'ignore_tags']
+ loader:
+ shuffle: False
+ drop_last: False
+ batch_size_per_card: 1 # must be 1
+ num_workers: 2
diff --git a/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml b/configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh_udml.yml
@@ -70,16 +70,14 @@ Loss:
  mode: "l2"
  model_name_pairs:
  - ["Student", "Teacher"]
- key: hidden_states
- index: 5
+ key: hidden_states_5
  name: "loss_5"
  - DistillationVQADistanceLoss:
  weight: 0.5
  mode: "l2"
  model_name_pairs:
  - ["Student", "Teacher"]
- key: hidden_states
- index: 8
+ key: hidden_states_8
  name: "loss_8"
 
 
@@ -182,4 +180,3 @@ Eval:
  drop_last: False
  batch_size_per_card: 8
  num_workers: 4
-
diff --git a/configs/rec/rec_d28_can.yml b/configs/rec/rec_d28_can.yml
@@ -0,0 +1,122 @@
+Global:
+ use_gpu: True
+ epoch_num: 240
+ log_smooth_window: 20
+ print_batch_step: 10
+ save_model_dir: ./output/rec/can/
+ save_epoch_step: 1
+ # evaluation is run every 1105 iterations (1 epoch)(batch_size = 8)
+ eval_batch_step: [0, 1105]
+ cal_metric_during_train: True
+ pretrained_model:
+ checkpoints:
+ save_inference_dir:
+ use_visualdl: False
+ infer_img: doc/datasets/crohme_demo/hme_00.jpg
+ # for data or label process
+ character_dict_path: ppocr/utils/dict/latex_symbol_dict.txt
+ max_text_length: 36
+ infer_mode: False
+ use_space_char: False
+ save_res_path: ./output/rec/predicts_can.txt
+
+Optimizer:
+ name: Momentum
+ momentum: 0.9
+ clip_norm_global: 100.0
+ lr:
+ name: TwoStepCosine
+ learning_rate: 0.01
+ warmup_epoch: 1
+ weight_decay: 0.0001
+
+Architecture:
+ model_type: rec
+ algorithm: CAN
+ in_channels: 1
+ Transform:
+ Backbone:
+ name: DenseNet 
+ growthRate: 24
+ reduction: 0.5
+ bottleneck: True
+ use_dropout: True
+ input_channel: 1 
+ Head:
+ name: CANHead
+ in_channel: 684
+ out_channel: 111
+ max_text_length: 36
+ ratio: 16
+ attdecoder:
+ is_train: True
+ input_size: 256
+ hidden_size: 256
+ encoder_out_channel: 684
+ dropout: True
+ dropout_ratio: 0.5
+ word_num: 111
+ counting_decoder_out_channel: 111
+ attention:
+ attention_dim: 512
+ word_conv_kernel: 1
+
+Loss:
+ name: CANLoss
+
+PostProcess:
+ name: CANLabelDecode
+
+Metric:
+ name: CANMetric
+ main_indicator: exp_rate
+
+Train:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/CROHME/training/images/
+ label_file_list: ["./train_data/CROHME/training/labels.txt"]
+ transforms:
+ - DecodeImage:
+ channel_first: False
+ - NormalizeImage:
+ mean: [0,0,0]
+ std: [1,1,1]
+ order: 'hwc'
+ - GrayImageChannelFormat: 
+ inverse: True
+ - CANLabelEncode:
+ lower: False
+ - KeepKeys:
+ keep_keys: ['image', 'label']
+ loader:
+ shuffle: True
+ batch_size_per_card: 8
+ drop_last: False
+ num_workers: 4
+ collate_fn: DyMaskCollator
+
+Eval:
+ dataset:
+ name: SimpleDataSet
+ data_dir: ./train_data/CROHME/evaluation/images/
+ label_file_list: ["./train_data/CROHME/evaluation/labels.txt"]
+ transforms: 
+ - DecodeImage:
+ channel_first: False
+ - NormalizeImage:
+ mean: [0,0,0]
+ std: [1,1,1]
+ order: 'hwc'
+ - GrayImageChannelFormat:
+ inverse: True
+ - CANLabelEncode:
+ lower: False
+ - KeepKeys:
+ keep_keys: ['image', 'label']
+ loader:
+ shuffle: False
+ drop_last: False
+ batch_size_per_card: 1
+ num_workers: 4
+ collate_fn: DyMaskCollator