README_en.md to README.md

PaddlePaddle · LDOUBLEV · Dec 15, 2020 · Dec 11, 2020 · Dec 13, 2020 · Dec 13, 2020
commit e99565c7f35d495c07db3b08a0d06c44facd76e9
diff --git a/README.md b/README.md
@@ -1,14 +1,15 @@
 English | [简体中文](README_ch.md)
 
 ## Introduction
-PaddleOCR aims to create rich, leading, and practical OCR tools that help users train better models and apply them into practice.
+PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and apply them into practice.
 
 **Recent updates**
+- 2020.11.25 Update a new data annotation tool, i.e., [PPOCRLabel](./PPOCRLabel/README_en.md), which is helpful to improve the labeling efficiency. Moreover, the labeling results can be used in training of the PP-OCR system directly.
 - 2020.9.22 Update the PP-OCR technical article, https://arxiv.org/abs/2009.09941
-- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipline](#PP-OCR-Pipline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
+- 2020.9.19 Update the ultra lightweight compressed ppocr_mobile_slim series models, the overall model size is 3.5M (see [PP-OCR Pipeline](#PP-OCR-Pipeline)), suitable for mobile deployment. [Model Downloads](#Supported-Chinese-model-list)
 - 2020.9.17 Update the ultra lightweight ppocr_mobile series and general ppocr_server series Chinese and English ocr models, which are comparable to commercial effects. [Model Downloads](#Supported-Chinese-model-list)
-- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
-- 2020.8.24 Support the use of PaddleOCR through whl package installation，pelease refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
+- 2020.9.17 update [English recognition model](./doc/doc_en/models_list_en.md#english-recognition-model) and [Multilingual recognition model](doc/doc_en/models_list_en.md#english-recognition-model), `English`, `Chinese`, `German`, `French`, `Japanese` and `Korean` have been supported. Models for more languages will continue to be updated.
+- 2020.8.24 Support the use of PaddleOCR through whl package installation，please refer [PaddleOCR Package](./doc/doc_en/whl_en.md)
 - 2020.8.21 Update the replay and PPT of the live lesson at Bilibili on August 18, lesson 2, easy to learn and use OCR tool spree. [Get Address](https://aistudio.baidu.com/aistudio/education/group/info/1519)
 - [more](./doc/doc_en/update_en.md)
 
@@ -32,6 +33,15 @@ PaddleOCR aims to create rich, leading, and practical OCR tools that help users
 
 The above pictures are the visualizations of the general ppocr_server model. For more effect pictures, please see [More visualizations](./doc/doc_en/visualization_en.md).
 
+<a name="Community"></a>
+## Community
+- Scan the QR code below with your Wechat, you can access to official technical exchange group. Look forward to your participation.
+
+<div align="center">
+<img src="./doc/joinus.PNG" width = "200" height = "200" />
+</div>
+
+
 ## Quick Experience
 
 You can also quickly experience the ultra-lightweight OCR : [Online Experience](https://www.paddlepaddle.org.cn/hub/scene/ocr)
@@ -48,55 +58,65 @@ Mobile DEMO experience (based on EasyEdge and Paddle-Lite, supports iOS and Andr
 
 <a name="Supported-Chinese-model-list"></a>
 
-## PP-OCR 1.1 series model list（Update on Sep 17）
+## PP-OCR 2.0 series model list（Update on Sep 17）
 
 | Model introduction | Model name | Recommended scene | Detection model | Direction classifier | Recognition model |
 | ------------------------------------------------------------ | ---------------------------- | ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
-| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v1.1_xx | Mobile & server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/det/ch_ppocr_mobile_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/rec/ch_ppocr_mobile_v1.1_rec_pre.tar) | 
-| Chinese and English general OCR model (155.1M) | ch_ppocr_server_v1.1_xx | Server | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/det/ch_ppocr_server_v1.1_det_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_train.tar) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/20-09-22/server/rec/ch_ppocr_server_v1.1_rec_pre.tar) | 
-| Chinese and English ultra-lightweight compressed OCR model (3.5M) | ch_ppocr_mobile_slim_v1.1_xx | Mobile | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/det/ch_ppocr_mobile_v1.1_det_prune_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_det_prune_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/cls/ch_ppocr_mobile_v1.1_cls_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_cls_quant_opt.nb) | [inference model](https://paddleocr.bj.bcebos.com/20-09-22/mobile-slim/rec/ch_ppocr_mobile_v1.1_rec_quant_infer.tar) / [slim model](https://paddleocr.bj.bcebos.com/20-09-22/mobile/lite/ch_ppocr_mobile_v1.1_rec_quant_opt.nb) |
+| Chinese and English ultra-lightweight OCR model (8.1M) | ch_ppocr_mobile_v2.0_xx | Mobile & server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_det_train.tar)|[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_train.tar) |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_rec_pre.tar)  |
+| Chinese and English general OCR model (143M) | ch_ppocr_server_v2.0_xx | Server |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_det_train.tar)  |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_traingit.tar)  |[inference model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_infer.tar) / [pre-trained model](https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_server_v2.0_rec_pre.tar)  | 
+
 
-For more model downloads (including multiple languages), please refer to [PP-OCR v1.1 series model downloads](./doc/doc_en/models_list_en.md)
+For more model downloads (including multiple languages), please refer to [PP-OCR v2.0 series model downloads](./doc/doc_en/models_list_en.md).
 
+For a new language request, please refer to [Guideline for new language_requests](#language_requests).
 
 ## Tutorials
 - [Installation](./doc/doc_en/installation_en.md)
 - [Quick Start](./doc/doc_en/quickstart_en.md)
 - [Code Structure](./doc/doc_en/tree_en.md)
-- Algorithm introduction
+- Algorithm Introduction
  - [Text Detection Algorithm](./doc/doc_en/algorithm_overview_en.md)
  - [Text Recognition Algorithm](./doc/doc_en/algorithm_overview_en.md)
- - [PP-OCR Pipline](#PP-OCR-Pipline)
-- Model training/evaluation
+ - [PP-OCR Pipeline](#PP-OCR-Pipeline)
+- Model Training/Evaluation
  - [Text Detection](./doc/doc_en/detection_en.md)
  - [Text Recognition](./doc/doc_en/recognition_en.md)
  - [Direction Classification](./doc/doc_en/angle_class_en.md)
  - [Yml Configuration](./doc/doc_en/config_en.md)
 - Inference and Deployment
- - [Quick inference based on pip](./doc/doc_en/whl_en.md)
+ - [Quick Inference Based on PIP](./doc/doc_en/whl_en.md)
  - [Python Inference](./doc/doc_en/inference_en.md)
  - [C++ Inference](./deploy/cpp_infer/readme_en.md)
  - [Serving](./deploy/hubserving/readme_en.md)
- - [Mobile](./deploy/lite/readme_en.md)
- - [Model Quantization](./deploy/slim/quantization/README_en.md)
- - [Model Compression](./deploy/slim/prune/README_en.md)
- - [Benchmark](./doc/doc_en/benchmark_en.md)
+ - [Mobile](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/lite/readme_en.md)
+ - [Model Quantization](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/quantization/README_en.md)
+ - [Model Compression](https://github.com/PaddlePaddle/PaddleOCR/blob/develop/deploy/slim/prune/README_en.md)
+ - [Benchmark](./doc/doc_en/benchmark_en.md) 
+- Data Annotation and Synthesis
+ - [Semi-automatic Annotation Tool](./PPOCRLabel/README_en.md)
+ - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
+ - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
 - Datasets
  - [General OCR Datasets(Chinese/English)](./doc/doc_en/datasets_en.md)
  - [HandWritten_OCR_Datasets(Chinese)](./doc/doc_en/handwritten_datasets_en.md)
  - [Various OCR Datasets(multilingual)](./doc/doc_en/vertical_and_multilingual_datasets_en.md)
- - [Data Annotation Tools](./doc/doc_en/data_annotation_en.md)
- - [Data Synthesis Tools](./doc/doc_en/data_synthesis_en.md)
 - [Visualization](#Visualization)
+- [New language requests](#language_requests)
 - [FAQ](./doc/doc_en/FAQ_en.md)
 - [Community](#Community)
 - [References](./doc/doc_en/reference_en.md)
 - [License](#LICENSE)
 - [Contribution](#CONTRIBUTION)
 
-<a name="PP-OCR-Pipline"></a>
+***Note: The dynamic graphs branch is still under development.
+Currently, only dynamic graph training, python-end prediction, and C++ prediction are supported.
+If you need mobile-end deployment cases or quantitative demo,
+please use the static graph branch.***
 
-## PP-OCR Pipline
+
+<a name="PP-OCR-Pipeline"></a>
+
+## PP-OCR Pipeline
 
 <div align="center">
  <img src="./doc/ppocr_framework.png" width="800">
@@ -109,10 +129,10 @@ PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of thr
 ## Visualization [more](./doc/doc_en/visualization_en.md)
 - Chinese OCR model
 <div align="center">
- <img src="./doc/imgs_results/1102.jpg" width="800">
- <img src="./doc/imgs_results/1104.jpg" width="800">
- <img src="./doc/imgs_results/1106.jpg" width="800">
- <img src="./doc/imgs_results/1105.jpg" width="800">
+ <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00056221.jpg" width="800">
+ <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00175408.jpg" width="800">
+ <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/00057937.jpg" width="800">
+ <img src="./doc/imgs_results/ch_ppocr_mobile_v2.0/rotate_00052204.jpg" width="800">
 </div>
 
 - English OCR model
@@ -123,16 +143,27 @@ PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of thr
 - Multilingual OCR model
 <div align="center">
  <img src="./doc/imgs_results/1110.jpg" width="800">
- <img src="./doc/imgs_results/1112.jpg" width="800">
+ <img src="./doc/imgs_results/korean.jpg" width="800">
 </div>
 
-<a name="Community"></a>
-## Community
-Scan the QR code below with your Wechat and completing the questionnaire, you can access to official technical exchange group.
 
-<div align="center">
-<img src="./doc/joinus.PNG" width = "200" height = "200" />
-</div>
+<a name="language_requests"></a>
+## Guideline for new language requests
+
+If you want to request a new language support, a PR with 2 following files are needed：
+
+1. In folder [ppocr/utils/dict](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/dict),
+it is necessary to submit the dict text to this path and name it with `{language}_dict.txt` that contains a list of all characters. Please see the format example from other files in that folder.
+
+2. In folder [ppocr/utils/corpus](https://github.com/PaddlePaddle/PaddleOCR/tree/develop/ppocr/utils/corpus),
+it is necessary to submit the corpus to this path and name it with `{language}_corpus.txt` that contains a list of words in your language.
+Maybe, 50000 words per language is necessary at least.
+Of course, the more, the better.
+
+If your language has unique elements, please tell me in advance within any way, such as useful links, wikipedia and so on.
+
+More details, please refer to [Multilingual OCR Development Plan](https://github.com/PaddlePaddle/PaddleOCR/issues/1048).
+
 
 <a name="LICENSE"></a>
 ## License
@@ -149,3 +180,7 @@ We welcome all the contributions to PaddleOCR and appreciate for your feedback v
 - Thanks [authorfu](https://github.com/authorfu) for contributing Android demo and [xiadeye](https://github.com/xiadeye) contributing iOS demo, respectively.
 - Thanks [BeyondYourself](https://github.com/BeyondYourself) for contributing many great suggestions and simplifying part of the code style.
 - Thanks [tangmq](https://gitee.com/tangmq) for contributing Dockerized deployment services to PaddleOCR and supporting the rapid release of callable Restful API services.
+- Thanks [lijinhan](https://github.com/lijinhan) for contributing a new way, i.e., java SpringBoot, to achieve the request for the Hubserving deployment.
+- Thanks [Mejans](https://github.com/Mejans) for contributing the Occitan corpus and character set.
+- Thanks [LKKlein](https://github.com/LKKlein) for contributing a new deploying package with the Golang program language.
+- Thanks [Evezerest](https://github.com/Evezerest), [ninetailskim](https://github.com/ninetailskim), [edencfc](https://github.com/edencfc), [BeyondYourself](https://github.com/BeyondYourself) and [1084667371](https://github.com/1084667371) for contributing a new data annotation tool, i.e., PPOCRLabel。