Skip to content

Commit

Permalink
Merge pull request #265 from xxxpsyduck/develop
Browse files Browse the repository at this point in the history
Update English docs version 24.06.20
  • Loading branch information
MissPenguin committed Jun 26, 2020
2 parents 60fb5a8 + 0c4eb09 commit 9313bdf
Show file tree
Hide file tree
Showing 11 changed files with 88 additions and 79 deletions.
55 changes: 28 additions & 27 deletions README_en.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
English | [简体中文](README.md)

## Introduction
## INTRODUCTION
PaddleOCR aims to create a rich, leading, and practical OCR tools that help users train better models and apply them into practice.

**Recent updates**
- 2020.6.8 Add [dataset](./doc/doc_en/datasets_en.md) and keep updating
- 2020.6.5 Support exporting `attention` model to `inference_model`
- 2020.6.5 Support separate prediction and recognition, output result score
- 2020.5.30 Provide ultra-lightweight Chinese OCR online experience
- 2020.5.30 Provide lightweight Chinese OCR online experience
- 2020.5.30 Model prediction and training supported on Windows system
- [more](./doc/doc_en/update_en.md)

## Features
- Ultra-lightweight Chinese OCR model, total model size is only 8.6M
## FEATURES
- Lightweight Chinese OCR model, total model size is only 8.6M
- Single model supports Chinese and English numbers combination recognition, vertical text recognition, long text recognition
- Detection model DB (4.1M) + recognition model CRNN (4.5M)
- Various text detection algorithms: EAST, DB
Expand All @@ -22,34 +22,34 @@ PaddleOCR aims to create a rich, leading, and practical OCR tools that help user

|Model Name|Description |Detection Model link|Recognition Model link|
|-|-|-|-|
|chinese_db_crnn_mobile|Ultra-lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_mobile|lightweight Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|chinese_db_crnn_server|General Chinese OCR model|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|[inference model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar) & [pre-trained model](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|


For testing our Chinese OCR online:https://www.paddlepaddle.org.cn/hub/scene/ocr

**You can also quickly experience the Ultra-lightweight Chinese OCR and General Chinese OCR models as follows:**
**You can also quickly experience the lightweight Chinese OCR and General Chinese OCR models as follows:**

## **Ultra-lightweight Chinese OCR and General Chinese OCR inference**
## **LIGHTWEIGHT CHINESE OCR AND GENERAL CHINESE OCR INFERENCE**

![](doc/imgs_results/11.jpg)

The picture above is the result of our Ultra-lightweight Chinese OCR model. For more testing results, please see the end of the article [Ultra-lightweight Chinese OCR results](#Ultra-lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).
The picture above is the result of our lightweight Chinese OCR model. For more testing results, please see the end of the article [lightweight Chinese OCR results](#lightweight-Chinese-OCR-results) and [General Chinese OCR results](#General-Chinese-OCR-results).

#### 1. Environment configuration
#### 1. ENVIRONMENT CONFIGURATION

Please see [Quick installation](./doc/doc_en/installation_en.md)

#### 2. Download inference models
#### 2. DOWNLOAD INFERENCE MODELS

#### (1) Download Ultra-lightweight Chinese OCR models
#### (1) Download lightweight Chinese OCR models
*If wget is not installed in the windows system, you can copy the link to the browser to download the model. After model downloaded, unzip it and place it in the corresponding directory*

```
mkdir inference && cd inference
# Download the detection part of the Ultra-lightweight Chinese OCR and decompress it
# Download the detection part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db_infer.tar && tar xf ch_det_mv3_db_infer.tar
# Download the recognition part of the Ultra-lightweight Chinese OCR and decompress it
# Download the recognition part of the lightweight Chinese OCR and decompress it
wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn_infer.tar && tar xf ch_rec_mv3_crnn_infer.tar
cd ..
```
Expand All @@ -63,7 +63,7 @@ wget https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn_infer.tar && t
cd ..
```

#### 3. Single image and batch image prediction
#### 3. SINGLE IMAGE AND BATCH PREDICTION

The following code implements text detection and recognition inference tandemly. When performing prediction, you need to specify the path of a single image or image folder through the parameter `image_dir`, the parameter `det_model_dir` specifies the path to detection model, and the parameter `rec_model_dir` specifies the path to the recognition model. The visual prediction results are saved to the `./inference_results` folder by default.

Expand All @@ -87,14 +87,14 @@ python3 tools/infer/predict_system.py --image_dir="./doc/imgs/11.jpg" --det_mode

For more text detection and recognition models, please refer to the document [Inference](./doc/doc_en/inference_en.md)

## Documentation
## DOCUMENTATION
- [Quick installation](./doc/doc_en/installation_en.md)
- [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)
- [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)
- [Inference](./doc/doc_en/inference_en.md)
- [Dataset](./doc/doc_en/datasets_en.md)

## Text detection algorithm
## TEXT DETECTION ALGORITHM

PaddleOCR open source text detection algorithms list:
- [x] EAST([paper](https://arxiv.org/abs/1704.03155))
Expand All @@ -113,14 +113,14 @@ On the ICDAR2015 dataset, the text detection result is as follows:
For use of [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) street view dataset with a total of 3w training data,the related configuration and pre-trained models for Chinese detection task are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|lightweight Chinese model|MobileNetV3|det_mv3_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_mv3_db.tar)|
|General Chinese OCR model|ResNet50_vd|det_r50_vd_db.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_det_r50_vd_db.tar)|

* Note: For the training and evaluation of the above DB model, post-processing parameters box_thresh=0.6 and unclip_ratio=1.5 need to be set. If using different datasets and different models for training, these two parameters can be adjusted for better result.

For the training guide and use of PaddleOCR text detection algorithms, please refer to the document [Text detection model training/evaluation/prediction](./doc/doc_en/detection_en.md)

## Text recognition algorithm
## TEXT RECOGNITION ALGORITHM

PaddleOCR open-source text recognition algorithms list:
- [x] CRNN([paper](https://arxiv.org/abs/1507.05717))
Expand All @@ -145,16 +145,16 @@ Refer to [DTRB](https://arxiv.org/abs/1904.01906), the training and evaluation r
We use [LSVT](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/doc/doc_en/datasets_en.md#1-icdar2019-lsvt) dataset and cropout 30w traning data from original photos by using position groundtruth and make some calibration needed. In addition, based on the LSVT corpus, 500w synthetic data is generated to train the Chinese model. The related configuration and pre-trained models are as follows:
|Model|Backbone|Configuration file|Pre-trained model|
|-|-|-|-|
|Ultra-lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|lightweight Chinese model|MobileNetV3|rec_chinese_lite_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_mv3_crnn.tar)|
|General Chinese OCR model|Resnet34_vd|rec_chinese_common_train.yml|[Download link](https://paddleocr.bj.bcebos.com/ch_models/ch_rec_r34_vd_crnn.tar)|

Please refer to the document for training guide and use of PaddleOCR text recognition algorithms [Text recognition model training/evaluation/prediction](./doc/doc_en/recognition_en.md)

## End-to-end OCR algorithm
## END-TO-END OCR ALGORITHM
- [ ] [End2End-PSL](https://arxiv.org/abs/1909.07808)(Baidu Self-Research, comming soon)

<a name="Ultra-lightweight Chinese OCR results"></a>
## Ultra-lightweight Chinese OCR results
<a name="lightweight Chinese OCR results"></a>
## LIGHTWEIGHT CHINESE OCR RESULTS
![](doc/imgs_results/1.jpg)
![](doc/imgs_results/7.jpg)
![](doc/imgs_results/12.jpg)
Expand Down Expand Up @@ -189,11 +189,12 @@ Please refer to the document for training guide and use of PaddleOCR text recogn

[more](./doc/doc_en/FAQ_en.md)

## Welcome to the PaddleOCR technical exchange group
WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the group~
## WELCOME TO THE PaddleOCR TECHNICAL EXCHANGE GROUP
WeChat: paddlehelp, note OCR, our assistant will get you into the group~

<img src="./doc/paddlehelp.jpg" width = "200" height = "200" />

## References
## REFERENCES
```
1. EAST:
@inproceedings{zhou2017east,
Expand Down Expand Up @@ -248,10 +249,10 @@ WeChat: paddlehelp . remarks OCR, the assistant will invite you to join the grou
}
```

## License
## LICENSE
This project is released under <a href="https://github.com/PaddlePaddle/PaddleOCR/blob/master/LICENSE">Apache 2.0 license</a>

## Contribution
## CONTRIBUTION
We welcome all the contributions to PaddleOCR and appreciate for your feedback very much.

- Many thanks to [Khanh Tran](https://github.com/xxxpsyduck) for contributing the English documentation.
Expand Down
4 changes: 4 additions & 0 deletions doc/doc_en/FAQ_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,3 +47,7 @@ At present, the open source model, dataset and magnitude are as follows:
10. **Error in using the model with TPS module for prediction**
Error message: Input(X) dims[3] and Input(Grid) dims[2] should be equal, but received X dimension[3](108) != Grid dimension[2](100)
Solution:TPS does not support variable shape. Please set --rec_image_shape='3,32,100' and --rec_char_type='en'

11. **Custom dictionary used during training, the recognition results show that words do not appear in the dictionary**

The used custom dictionary path is not set when making prediction. The solution is setting parameter `rec_char_dict_path` to the corresponding dictionary file.
8 changes: 4 additions & 4 deletions doc/doc_en/config_en.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Optional parameters list
# OPTIONAL PARAMETERS LIST

The following list can be viewed via `--help`

Expand All @@ -8,7 +8,7 @@ The following list can be viewed via `--help`
| -o | ALL | set configuration options | None | Configuration using -o has higher priority than the configuration file selected with -c. E.g: `-o Global.use_gpu=false` |


## Introduction to Global Parameters of Configuration File
## INTRODUCTION TO GLOBAL PARAMETERS OF CONFIGURATION FILE

Take `rec_chinese_lite_train.yml` as an example

Expand All @@ -35,7 +35,7 @@ Take `rec_chinese_lite_train.yml` as an example
| checkpoints | Load saved model path | None | Used to load saved parameters to continue training after interruption |
| save_inference_dir | path to save model for inference | None | Use to save inference model |

## Introduction to Reader parameters of Configuration file
## INTRODUCTION TO READER PARAMETERS OF CONFIGURATION FILE

Take `rec_chinese_reader.yml` as an example:

Expand All @@ -47,7 +47,7 @@ Take `rec_chinese_reader.yml` as an example:
| label_file_path | Groundtruth file path | ./train_data/rec_gt_train.txt| \ |
| infer_img | Result folder path | ./infer_img | \|

## Introduction to Optimizer parameters of Configuration file
## INTRODUCTION TO OPTIMIZER PARAMETERS OF CONFIGURATION FILE

Take `rec_icdar15_train.yml` as an example:

Expand Down
8 changes: 4 additions & 4 deletions doc/doc_en/customize_en.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# How to make your own ultra-lightweight OCR models?
# HOW TO MAKE YOUR OWN LIGHTWEIGHT OCR MODEL?

The process of making a customized ultra-lightweight OCR models can be divided into three steps: training text detection model, training text recognition model, and concatenate the predictions from previous steps.

## step1: Train text detection model
## STEP1: TRAIN TEXT DETECTION MODEL

PaddleOCR provides two text detection algorithms: EAST and DB. Both support MobileNetV3 and ResNet50_vd backbone networks, select the corresponding configuration file as needed and start training. For example, to train with MobileNetV3 as the backbone network for DB detection model :
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
```
For more details about data preparation and training tutorials, refer to the documentation [Text detection model training/evaluation/prediction](./detection_en.md)

## step2: Train text recognition model
## STEP2: TRAIN TEXT RECOGNITION MODEL

PaddleOCR provides four text recognition algorithms: CRNN, Rosetta, STAR-Net, and RARE. They all support two backbone networks: MobileNetV3 and ResNet34_vd, select the corresponding configuration files as needed to start training. For example, to train a CRNN recognition model that uses MobileNetV3 as the backbone network:
```
python3 tools/train.py -c configs/rec/rec_chinese_lite_train.yml
```
For more details about data preparation and training tutorials, refer to the documentation [Text recognition model training/evaluation/prediction](./recognition_en.md)

## step3: Concatenate predictions
## STEP3: CONCATENATE PREDICTIONS

PaddleOCR provides a concatenation tool for detection and recognition models, which can connect any trained detection model and any recognition model into a two-stage text recognition system. The input image goes through four main stages: text detection, text rectification, text recognition, and score filtering to output the text position and recognition results, and at the same time, you can choose to visualize the results.

Expand Down
5 changes: 4 additions & 1 deletion doc/doc_en/datasets_en.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Dataset
## DATASET
This is a collection of commonly used Chinese datasets, which is being updated continuously. You are welcome to contribute to this list~
- [ICDAR2019-LSVT](#ICDAR2019-LSVT)
- [ICDAR2017-RCTW-17](#ICDAR2017-RCTW-17)
Expand All @@ -13,8 +13,11 @@ In addition to opensource data, users can also use synthesis tools to synthesize
- **Data sources**https://ai.baidu.com/broad/introduction?dataset=lsvt
- **Introduction**: A total of 45w Chinese street view images, including 5w (2w test + 3w training) fully labeled data (text coordinates + text content), 40w weakly labeled data (text content only), as shown in the following figure:
![](../datasets/LSVT_1.jpg)

(a) Fully labeled data

![](../datasets/LSVT_2.jpg)

(b) Weakly labeled data
- **Download link**https://ai.baidu.com/broad/download?dataset=lsvt

Expand Down
14 changes: 7 additions & 7 deletions doc/doc_en/detection_en.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Text detection
# TEXT DETECTION

This section uses the icdar15 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.

## Data preparation
## DATA PREPARATION
The icdar2015 dataset can be obtained from [official website](https://rrc.cvc.uab.es/?ch=4&com=downloads). Registration is required for downloading.

Decompress the downloaded dataset to the working directory, assuming it is decompressed under PaddleOCR/train_data/. In addition, PaddleOCR organizes many scattered annotation files into two separate annotation files for train and test respectively, which can be downloaded by wget:
Expand All @@ -27,13 +27,13 @@ The provided annotation file format is as follow:
" Image file name Image annotation information encoded by json.dumps"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
The image annotation information before json.dumps encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.
The image annotation after json.dumps() encoding is a list containing multiple dictionaries. The `points` in the dictionary represent the coordinates (x, y) of the four points of the text box, arranged clockwise from the point at the upper left corner.

`transcription` represents the text of the current text box, and this information is not needed in the text detection task.
If you want to train PaddleOCR on other datasets, you can build the annotation file according to the above format.


## Quickstart training
## TRAINING

First download the pretrained model. The detection model of PaddleOCR currently supports two backbones, namely MobileNetV3 and ResNet50_vd. You can use the model in [PaddleClas](https://github.com/PaddlePaddle/PaddleClas/tree/master/ppcls/modeling/architectures) to replace backbone according to your needs.
```
Expand All @@ -56,7 +56,7 @@ tar xf ./pretrain_models/MobileNetV3_large_x0_5_pretrained.tar ./pretrain_models
```

**Start training**
**START TRAINING**
```
python3 tools/train.py -c configs/det/det_mv3_db.yml
```
Expand All @@ -80,7 +80,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml -o Global.checkpoints=./you
**Note**:The priority of Global.checkpoints is higher than the priority of Global.pretrain_weights, that is, when two parameters are specified at the same time, the model specified by Global.checkpoints will be loaded first. If the model path specified by Global.checkpoints is wrong, the one specified by Global.pretrain_weights will be loaded.


## Evaluation Indicator
## EVALUATION

PaddleOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean.

Expand All @@ -100,7 +100,7 @@ python3 tools/eval.py -c configs/det/det_mv3_db.yml -o Global.checkpoints="./ou

* Note: box_thresh and unclip_ratio are parameters required for DB post-processing, and not need to be set when evaluating the EAST model.

## Test detection result
## TEST DETECTION RESULT

Test the detection result on a single image:
```
Expand Down
Loading

0 comments on commit 9313bdf

Please sign in to comment.