Merge pull request PaddlePaddle#7956 from Lieberk/dygraph

add sr model Text Telescope
Evezerest · Oct 20, 2022 · a50c731 · a50c731
2 parents f91026d + 182c1db
commit a50c731
Show file tree

Hide file tree

Showing 13 changed files with 858 additions and 15 deletions.
diff --git a/configs/sr/sr_telescope.yml b/configs/sr/sr_telescope.yml
@@ -0,0 +1,84 @@
+Global:
+ use_gpu: true
+ epoch_num: 100
+ log_smooth_window: 20
+ print_batch_step: 10
+ save_model_dir: ./output/sr/sr_telescope/
+ save_epoch_step: 3
+ # evaluation is run every 2000 iterations
+ eval_batch_step: [0, 1000]
+ cal_metric_during_train: False
+ pretrained_model:
+ checkpoints:
+ save_inference_dir: ./output/sr/sr_telescope/infer
+ use_visualdl: False
+ infer_img: doc/imgs_words_en/word_52.png
+ # for data or label process
+ character_dict_path:
+ max_text_length: 100
+ infer_mode: False
+ use_space_char: False
+ save_res_path: ./output/sr/predicts_telescope.txt
+
+Optimizer:
+ name: Adam
+ beta1: 0.5
+ beta2: 0.999
+ clip_norm: 0.25
+ lr:
+ learning_rate: 0.0001
+
+Architecture:
+ model_type: sr
+ algorithm: Telescope
+ Transform:
+ name: TBSRN
+ STN: True
+ infer_mode: False
+
+Loss:
+ name: TelescopeLoss
+ confuse_dict_path: ./ppocr/utils/dict/confuse.pkl
+
+
+PostProcess:
+ name: None
+
+Metric:
+ name: SRMetric
+ main_indicator: all
+
+Train:
+ dataset:
+ name: LMDBDataSetSR
+ data_dir: ./train_data/TextZoom/train
+ transforms:
+ - SRResize:
+ imgH: 32
+ imgW: 128
+ down_sample_scale: 2
+ - KeepKeys:
+ keep_keys: ['img_lr', 'img_hr', 'label'] # dataloader will return list in this order
+ loader:
+ shuffle: False
+ batch_size_per_card: 16
+ drop_last: True
+ num_workers: 4
+
+Eval:
+ dataset:
+ name: LMDBDataSetSR
+ data_dir: ./train_data/TextZoom/test
+ transforms:
+ - SRResize:
+ imgH: 32
+ imgW: 128
+ down_sample_scale: 2
+ - KeepKeys:
+ keep_keys: ['img_lr', 'img_hr', 'label'] # dataloader will return list in this order
+ loader:
+ shuffle: False
+ drop_last: False
+ batch_size_per_card: 16
+ num_workers: 4
+
diff --git a/doc/doc_ch/algorithm_sr_telescope.md b/doc/doc_ch/algorithm_sr_telescope.md
@@ -0,0 +1,128 @@
+# Text Telescope
+
+- [1. 算法简介](#1)
+- [2. 环境配置](#2)
+- [3. 模型训练、评估、预测](#3)
+ - [3.1 训练](#3-1)
+ - [3.2 评估](#3-2)
+ - [3.3 预测](#3-3)
+- [4. 推理部署](#4)
+ - [4.1 Python推理](#4-1)
+ - [4.2 C++推理](#4-2)
+ - [4.3 Serving服务化部署](#4-3)
+ - [4.4 更多推理部署](#4-4)
+- [5. FAQ](#5)
+
+<a name="1"></a>
+## 1. 算法简介
+
+论文信息：
+> [Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.pdf)
+
+> Chen, Jingye, Bin Li, and Xiangyang Xue
+
+> CVPR, 2021
+
+参考[FudanOCR](https://github.com/FudanVI/FudanOCR/tree/main/scene-text-telescope) 数据下载说明，在TextZoom测试集合上超分算法效果如下：
+
+|模型|骨干网络|PSNR_Avg|SSIM_Avg|配置文件|下载链接|
+|---|---|---|---|---|---|
+|Text Telescope|tbsrn|21.56|0.7411| [configs/sr/sr_telescope.yml](../../configs/sr/sr_telescope.yml)|[训练模型](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz)|
+
+[TextZoom数据集](https://paddleocr.bj.bcebos.com/dataset/TextZoom.tar) 来自两个超分数据集RealSR和SR-RAW，两个数据集都包含LR-HR对，TextZoom有17367对训数据和4373对测试数据。
+
+<a name="2"></a>
+## 2. 环境配置
+请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境，参考[《项目克隆》](./clone.md)克隆项目代码。
+
+
+<a name="3"></a>
+## 3. 模型训练、评估、预测
+
+请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化，训练不同的识别模型只需要**更换配置文件**即可。
+
+- 训练
+
+在完成数据准备后，便可以启动训练，训练命令如下：
+
+```
+#单卡训练（训练周期长，不建议）
+python3 tools/train.py -c configs/sr/sr_telescope.yml
+
+#多卡训练，通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/sr/sr_telescope.yml
+
+```
+
+- 评估
+
+```
+# GPU 评估， Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+```
+
+- 预测：
+
+```
+# 预测使用的配置文件必须与训练一致
+python3 tools/infer_sr.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png
+```
+
+![](../imgs_words_en/word_52.png)
+
+执行命令后，上面图像的超分结果如下：
+
+![](../imgs_results/sr_word_52.png)
+
+<a name="4"></a>
+## 4. 推理部署
+
+<a name="4-1"></a>
+### 4.1 Python推理
+
+首先将文本超分训练过程中保存的模型，转换成inference model。以 Text-Telescope 训练的[模型](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz) 为例，可以使用如下命令进行转换：
+```shell
+python3 tools/export_model.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out
+```
+Text-Telescope 文本超分模型推理，可以执行如下命令：
+```
+python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128
+
+```
+
+执行命令后，图像的超分结果如下：
+
+![](../imgs_results/sr_word_52.png)
+
+<a name="4-2"></a>
+### 4.2 C++推理
+
+暂未支持
+
+<a name="4-3"></a>
+### 4.3 Serving服务化部署
+
+暂未支持
+
+<a name="4-4"></a>
+### 4.4 更多推理部署
+
+暂未支持
+
+<a name="5"></a>
+## 5. FAQ
+
+
+## 引用
+
+```bibtex
+@INPROCEEDINGS{9578891,
+ author={Chen, Jingye and Li, Bin and Xue, Xiangyang},
+ booktitle={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
+ title={Scene Text Telescope: Text-Focused Scene Image Super-Resolution}, 
+ year={2021},
+ volume={},
+ number={},
+ pages={12021-12030},
+ doi={10.1109/CVPR46437.2021.01185}}
+```
diff --git a/doc/doc_en/algorithm_sr_telescope_en.md b/doc/doc_en/algorithm_sr_telescope_en.md
@@ -0,0 +1,137 @@
+# Text Gestalt
+
+- [1. Introduction](#1)
+- [2. Environment](#2)
+- [3. Model Training / Evaluation / Prediction](#3)
+ - [3.1 Training](#3-1)
+ - [3.2 Evaluation](#3-2)
+ - [3.3 Prediction](#3-3)
+- [4. Inference and Deployment](#4)
+ - [4.1 Python Inference](#4-1)
+ - [4.2 C++ Inference](#4-2)
+ - [4.3 Serving](#4-3)
+ - [4.4 More](#4-4)
+- [5. FAQ](#5)
+
+
+<a name="1"></a>
+## 1. Introduction
+
+Paper:
+> [Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.pdf)
+
+> Chen, Jingye, Bin Li, and Xiangyang Xue
+
+> CVPR, 2021
+
+Referring to the [FudanOCR](https://github.com/FudanVI/FudanOCR/tree/main/scene-text-telescope) data download instructions, the effect of the super-score algorithm on the TextZoom test set is as follows:
+
+|Model|Backbone|config|Acc|Download link|
+|---|---|---|---|---|---|
+|Text Gestalt|tsrn|21.56|0.7411| [configs/sr/sr_telescope.yml](../../configs/sr/sr_telescope.yml)|[train model](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz)|
+
+The [TextZoom dataset](https://paddleocr.bj.bcebos.com/dataset/TextZoom.tar) comes from two superfraction data sets, RealSR and SR-RAW, both of which contain LR-HR pairs. TextZoom has 17367 pairs of training data and 4373 pairs of test data.
+
+<a name="2"></a>
+## 2. Environment
+Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code.
+
+
+<a name="3"></a>
+## 3. Model Training / Evaluation / Prediction
+
+Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different models only requires **changing the configuration file**.
+
+Training:
+
+Specifically, after the data preparation is completed, the training can be started. The training command is as follows:
+
+```
+#Single GPU training (long training period, not recommended)
+
+python3 tools/train.py -c configs/sr/sr_telescope.yml
+
+#Multi GPU training, specify the gpu number through the --gpus parameter
+
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/sr/sr_telescope.yml
+
+```
+
+
+Evaluation:
+
+```
+# GPU evaluation
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+```
+
+Prediction:
+
+```
+# The configuration file used for prediction must match the training
+
+python3 tools/infer_sr.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png
+```
+
+![](../imgs_words_en/word_52.png)
+
+After executing the command, the super-resolution result of the above image is as follows:
+
+![](../imgs_results/sr_word_52.png)
+
+<a name="4"></a>
+## 4. Inference and Deployment
+
+<a name="4-1"></a>
+### 4.1 Python Inference
+
+First, the model saved during the training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz) ), you can use the following command to convert:
+
+```shell
+python3 tools/export_model.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out
+```
+
+For Text-Telescope super-resolution model inference, the following commands can be executed:
+
+```
+python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128
+
+```
+
+After executing the command, the super-resolution result of the above image is as follows:
+
+![](../imgs_results/sr_word_52.png)
+
+
+<a name="4-2"></a>
+### 4.2 C++ Inference
+
+Not supported
+
+<a name="4-3"></a>
+### 4.3 Serving
+
+Not supported
+
+<a name="4-4"></a>
+### 4.4 More
+
+Not supported
+
+<a name="5"></a>
+## 5. FAQ
+
+
+## Citation
+
+```bibtex
+@INPROCEEDINGS{9578891,
+ author={Chen, Jingye and Li, Bin and Xue, Xiangyang},
+ booktitle={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 
+ title={Scene Text Telescope: Text-Focused Scene Image Super-Resolution}, 
+ year={2021},
+ volume={},
+ number={},
+ pages={12021-12030},
+ doi={10.1109/CVPR46437.2021.01185}}
+```
diff --git a/doc/imgs_results/sr_word_52.png b/doc/imgs_results/sr_word_52.png
diff --git a/ppocr/losses/__init__.py b/ppocr/losses/__init__.py
@@ -63,6 +63,7 @@
 
 # sr loss
 from .stroke_focus_loss import StrokeFocusLoss
+from .text_focus_loss import TelescopeLoss
 
 
 def build_loss(config):
@@ -72,7 +73,7 @@ def build_loss(config):
  'CELoss', 'TableAttentionLoss', 'SARLoss', 'AsterLoss', 'SDMGRLoss',
  'VQASerTokenLayoutLMLoss', 'LossFromOutput', 'PRENLoss', 'MultiLoss',
  'TableMasterLoss', 'SPINAttentionLoss', 'VLLoss', 'StrokeFocusLoss',
- 'SLALoss', 'CTLoss', 'RFLLoss', 'DRRGLoss', 'CANLoss'
+ 'SLALoss', 'CTLoss', 'RFLLoss', 'DRRGLoss', 'CANLoss', 'TelescopeLoss'
  ]
  config = copy.deepcopy(config)
  module_name = config.pop('name')