forked from PaddlePaddle/PaddleOCR
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request PaddlePaddle#7956 from Lieberk/dygraph
add sr model Text Telescope
- Loading branch information
Showing
13 changed files
with
858 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
Global: | ||
use_gpu: true | ||
epoch_num: 100 | ||
log_smooth_window: 20 | ||
print_batch_step: 10 | ||
save_model_dir: ./output/sr/sr_telescope/ | ||
save_epoch_step: 3 | ||
# evaluation is run every 2000 iterations | ||
eval_batch_step: [0, 1000] | ||
cal_metric_during_train: False | ||
pretrained_model: | ||
checkpoints: | ||
save_inference_dir: ./output/sr/sr_telescope/infer | ||
use_visualdl: False | ||
infer_img: doc/imgs_words_en/word_52.png | ||
# for data or label process | ||
character_dict_path: | ||
max_text_length: 100 | ||
infer_mode: False | ||
use_space_char: False | ||
save_res_path: ./output/sr/predicts_telescope.txt | ||
|
||
Optimizer: | ||
name: Adam | ||
beta1: 0.5 | ||
beta2: 0.999 | ||
clip_norm: 0.25 | ||
lr: | ||
learning_rate: 0.0001 | ||
|
||
Architecture: | ||
model_type: sr | ||
algorithm: Telescope | ||
Transform: | ||
name: TBSRN | ||
STN: True | ||
infer_mode: False | ||
|
||
Loss: | ||
name: TelescopeLoss | ||
confuse_dict_path: ./ppocr/utils/dict/confuse.pkl | ||
|
||
|
||
PostProcess: | ||
name: None | ||
|
||
Metric: | ||
name: SRMetric | ||
main_indicator: all | ||
|
||
Train: | ||
dataset: | ||
name: LMDBDataSetSR | ||
data_dir: ./train_data/TextZoom/train | ||
transforms: | ||
- SRResize: | ||
imgH: 32 | ||
imgW: 128 | ||
down_sample_scale: 2 | ||
- KeepKeys: | ||
keep_keys: ['img_lr', 'img_hr', 'label'] # dataloader will return list in this order | ||
loader: | ||
shuffle: False | ||
batch_size_per_card: 16 | ||
drop_last: True | ||
num_workers: 4 | ||
|
||
Eval: | ||
dataset: | ||
name: LMDBDataSetSR | ||
data_dir: ./train_data/TextZoom/test | ||
transforms: | ||
- SRResize: | ||
imgH: 32 | ||
imgW: 128 | ||
down_sample_scale: 2 | ||
- KeepKeys: | ||
keep_keys: ['img_lr', 'img_hr', 'label'] # dataloader will return list in this order | ||
loader: | ||
shuffle: False | ||
drop_last: False | ||
batch_size_per_card: 16 | ||
num_workers: 4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
# Text Telescope | ||
|
||
- [1. 算法简介](#1) | ||
- [2. 环境配置](#2) | ||
- [3. 模型训练、评估、预测](#3) | ||
- [3.1 训练](#3-1) | ||
- [3.2 评估](#3-2) | ||
- [3.3 预测](#3-3) | ||
- [4. 推理部署](#4) | ||
- [4.1 Python推理](#4-1) | ||
- [4.2 C++推理](#4-2) | ||
- [4.3 Serving服务化部署](#4-3) | ||
- [4.4 更多推理部署](#4-4) | ||
- [5. FAQ](#5) | ||
|
||
<a name="1"></a> | ||
## 1. 算法简介 | ||
|
||
论文信息: | ||
> [Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.pdf) | ||
> Chen, Jingye, Bin Li, and Xiangyang Xue | ||
> CVPR, 2021 | ||
参考[FudanOCR](https://github.com/FudanVI/FudanOCR/tree/main/scene-text-telescope) 数据下载说明,在TextZoom测试集合上超分算法效果如下: | ||
|
||
|模型|骨干网络|PSNR_Avg|SSIM_Avg|配置文件|下载链接| | ||
|---|---|---|---|---|---| | ||
|Text Telescope|tbsrn|21.56|0.7411| [configs/sr/sr_telescope.yml](../../configs/sr/sr_telescope.yml)|[训练模型](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz)| | ||
|
||
[TextZoom数据集](https://paddleocr.bj.bcebos.com/dataset/TextZoom.tar) 来自两个超分数据集RealSR和SR-RAW,两个数据集都包含LR-HR对,TextZoom有17367对训数据和4373对测试数据。 | ||
|
||
<a name="2"></a> | ||
## 2. 环境配置 | ||
请先参考[《运行环境准备》](./environment.md)配置PaddleOCR运行环境,参考[《项目克隆》](./clone.md)克隆项目代码。 | ||
|
||
|
||
<a name="3"></a> | ||
## 3. 模型训练、评估、预测 | ||
|
||
请参考[文本识别训练教程](./recognition.md)。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要**更换配置文件**即可。 | ||
|
||
- 训练 | ||
|
||
在完成数据准备后,便可以启动训练,训练命令如下: | ||
|
||
``` | ||
#单卡训练(训练周期长,不建议) | ||
python3 tools/train.py -c configs/sr/sr_telescope.yml | ||
#多卡训练,通过--gpus参数指定卡号 | ||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/sr/sr_telescope.yml | ||
``` | ||
|
||
- 评估 | ||
|
||
``` | ||
# GPU 评估, Global.pretrained_model 为待测权重 | ||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy | ||
``` | ||
|
||
- 预测: | ||
|
||
``` | ||
# 预测使用的配置文件必须与训练一致 | ||
python3 tools/infer_sr.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png | ||
``` | ||
|
||
![](../imgs_words_en/word_52.png) | ||
|
||
执行命令后,上面图像的超分结果如下: | ||
|
||
![](../imgs_results/sr_word_52.png) | ||
|
||
<a name="4"></a> | ||
## 4. 推理部署 | ||
|
||
<a name="4-1"></a> | ||
### 4.1 Python推理 | ||
|
||
首先将文本超分训练过程中保存的模型,转换成inference model。以 Text-Telescope 训练的[模型](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz) 为例,可以使用如下命令进行转换: | ||
```shell | ||
python3 tools/export_model.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out | ||
``` | ||
Text-Telescope 文本超分模型推理,可以执行如下命令: | ||
``` | ||
python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128 | ||
``` | ||
|
||
执行命令后,图像的超分结果如下: | ||
|
||
![](../imgs_results/sr_word_52.png) | ||
|
||
<a name="4-2"></a> | ||
### 4.2 C++推理 | ||
|
||
暂未支持 | ||
|
||
<a name="4-3"></a> | ||
### 4.3 Serving服务化部署 | ||
|
||
暂未支持 | ||
|
||
<a name="4-4"></a> | ||
### 4.4 更多推理部署 | ||
|
||
暂未支持 | ||
|
||
<a name="5"></a> | ||
## 5. FAQ | ||
|
||
|
||
## 引用 | ||
|
||
```bibtex | ||
@INPROCEEDINGS{9578891, | ||
author={Chen, Jingye and Li, Bin and Xue, Xiangyang}, | ||
booktitle={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
title={Scene Text Telescope: Text-Focused Scene Image Super-Resolution}, | ||
year={2021}, | ||
volume={}, | ||
number={}, | ||
pages={12021-12030}, | ||
doi={10.1109/CVPR46437.2021.01185}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# Text Gestalt | ||
|
||
- [1. Introduction](#1) | ||
- [2. Environment](#2) | ||
- [3. Model Training / Evaluation / Prediction](#3) | ||
- [3.1 Training](#3-1) | ||
- [3.2 Evaluation](#3-2) | ||
- [3.3 Prediction](#3-3) | ||
- [4. Inference and Deployment](#4) | ||
- [4.1 Python Inference](#4-1) | ||
- [4.2 C++ Inference](#4-2) | ||
- [4.3 Serving](#4-3) | ||
- [4.4 More](#4-4) | ||
- [5. FAQ](#5) | ||
|
||
|
||
<a name="1"></a> | ||
## 1. Introduction | ||
|
||
Paper: | ||
> [Scene Text Telescope: Text-Focused Scene Image Super-Resolution](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.pdf) | ||
> Chen, Jingye, Bin Li, and Xiangyang Xue | ||
> CVPR, 2021 | ||
Referring to the [FudanOCR](https://github.com/FudanVI/FudanOCR/tree/main/scene-text-telescope) data download instructions, the effect of the super-score algorithm on the TextZoom test set is as follows: | ||
|
||
|Model|Backbone|config|Acc|Download link| | ||
|---|---|---|---|---|---| | ||
|Text Gestalt|tsrn|21.56|0.7411| [configs/sr/sr_telescope.yml](../../configs/sr/sr_telescope.yml)|[train model](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz)| | ||
|
||
The [TextZoom dataset](https://paddleocr.bj.bcebos.com/dataset/TextZoom.tar) comes from two superfraction data sets, RealSR and SR-RAW, both of which contain LR-HR pairs. TextZoom has 17367 pairs of training data and 4373 pairs of test data. | ||
|
||
<a name="2"></a> | ||
## 2. Environment | ||
Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code. | ||
|
||
|
||
<a name="3"></a> | ||
## 3. Model Training / Evaluation / Prediction | ||
|
||
Please refer to [Text Recognition Tutorial](./recognition_en.md). PaddleOCR modularizes the code, and training different models only requires **changing the configuration file**. | ||
|
||
Training: | ||
|
||
Specifically, after the data preparation is completed, the training can be started. The training command is as follows: | ||
|
||
``` | ||
#Single GPU training (long training period, not recommended) | ||
python3 tools/train.py -c configs/sr/sr_telescope.yml | ||
#Multi GPU training, specify the gpu number through the --gpus parameter | ||
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/sr/sr_telescope.yml | ||
``` | ||
|
||
|
||
Evaluation: | ||
|
||
``` | ||
# GPU evaluation | ||
python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy | ||
``` | ||
|
||
Prediction: | ||
|
||
``` | ||
# The configuration file used for prediction must match the training | ||
python3 tools/infer_sr.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png | ||
``` | ||
|
||
![](../imgs_words_en/word_52.png) | ||
|
||
After executing the command, the super-resolution result of the above image is as follows: | ||
|
||
![](../imgs_results/sr_word_52.png) | ||
|
||
<a name="4"></a> | ||
## 4. Inference and Deployment | ||
|
||
<a name="4-1"></a> | ||
### 4.1 Python Inference | ||
|
||
First, the model saved during the training process is converted into an inference model. ( [Model download link](https://paddleocr.bj.bcebos.com/contribution/Telescope_train.tar.gz) ), you can use the following command to convert: | ||
|
||
```shell | ||
python3 tools/export_model.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out | ||
``` | ||
|
||
For Text-Telescope super-resolution model inference, the following commands can be executed: | ||
|
||
``` | ||
python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128 | ||
``` | ||
|
||
After executing the command, the super-resolution result of the above image is as follows: | ||
|
||
![](../imgs_results/sr_word_52.png) | ||
|
||
|
||
<a name="4-2"></a> | ||
### 4.2 C++ Inference | ||
|
||
Not supported | ||
|
||
<a name="4-3"></a> | ||
### 4.3 Serving | ||
|
||
Not supported | ||
|
||
<a name="4-4"></a> | ||
### 4.4 More | ||
|
||
Not supported | ||
|
||
<a name="5"></a> | ||
## 5. FAQ | ||
|
||
|
||
## Citation | ||
|
||
```bibtex | ||
@INPROCEEDINGS{9578891, | ||
author={Chen, Jingye and Li, Bin and Xue, Xiangyang}, | ||
booktitle={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
title={Scene Text Telescope: Text-Focused Scene Image Super-Resolution}, | ||
year={2021}, | ||
volume={}, | ||
number={}, | ||
pages={12021-12030}, | ||
doi={10.1109/CVPR46437.2021.01185}} | ||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.