Skip to content

Commit

Permalink
add arxiv pps (PaddlePaddle#7893)
Browse files Browse the repository at this point in the history
* support reconaug

* rename ppstructurev2 and add arxiv link

* fix link
  • Loading branch information
littletomatodonkey committed Oct 12, 2022
1 parent 40065d0 commit 55bbc75
Show file tree
Hide file tree
Showing 12 changed files with 55 additions and 56 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools

## Recent updates
- **🔥2022.8.24 Release PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- Release [PP-Structurev2](./ppstructure/),with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**;
- Release [PP-StructureV2](./ppstructure/),with functions and performance fully upgraded, adapted to Chinese scenes, and new support for [Layout Recovery](./ppstructure/recovery) and **one line command to convert PDF to Word**;
- [Layout Analysis](./ppstructure/layout) optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms;
- [Table Recognition](./ppstructure/table) optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption;
- [Key Information Extraction](./ppstructure/kie) optimization:a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%.

- **🔥2022.7 Release [OCR scene application collection](./applications/README_en.md)**
- Release **9 vertical models** such as digital tube, LCD screen, license plate, handwriting recognition model, high-precision SVTR model, etc, covering the main OCR vertical applications in general, manufacturing, finance, and transportation industries.

Expand Down Expand Up @@ -129,7 +129,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- [Text recognition](./doc/doc_en/algorithm_overview_en.md)
- [End-to-end OCR](./doc/doc_en/algorithm_overview_en.md)
- [Table Recognition](./doc/doc_en/algorithm_overview_en.md)
- [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md)
- [Key Information Extraction](./doc/doc_en/algorithm_overview_en.md)
- [Add New Algorithms to PaddleOCR](./doc/doc_en/add_new_algorithm_en.md)
- Data Annotation and Synthesis
- [Semi-automatic Annotation Tool: PPOCRLabel](./PPOCRLabel/README.md)
Expand Down Expand Up @@ -181,7 +181,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
</details>

<details open>
<summary>PP-Structurev2</summary>
<summary>PP-StructureV2</summary>

- layout analysis + table recognition
<div align="center">
Expand All @@ -192,7 +192,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>

<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808.jpg" width="600">
</div>
Expand All @@ -204,7 +204,7 @@ PaddleOCR support a variety of cutting-edge algorithms related to OCR, and devel
- RE (Relation Extraction)
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>
</div>

<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
Expand Down
12 changes: 6 additions & 6 deletions README_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
## 近期更新

- **🔥2022.8.24 发布 PaddleOCR [release/2.6](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6)**
- 发布[PP-Structurev2](./ppstructure/),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery),支持**一行命令完成PDF转Word**
- 发布[PP-StructureV2](./ppstructure/),系统功能性能全面升级,适配中文场景,新增支持[版面复原](./ppstructure/recovery),支持**一行命令完成PDF转Word**
- [版面分析](./ppstructure/layout)模型优化:模型存储减少95%,速度提升11倍,平均CPU耗时仅需41ms;
- [表格识别](./ppstructure/table)模型优化:设计3大优化策略,预测耗时不变情况下,模型精度提升6%;
- [关键信息抽取](./ppstructure/kie)模型优化:设计视觉无关模型结构,语义实体识别精度提升2.8%,关系抽取精度提升9.1%。

- **🔥2022.8 发布 [OCR场景应用集合](./applications)**
- 包含数码管、液晶屏、车牌、高精度SVTR模型、手写体识别等**9个垂类模型**,覆盖通用,制造、金融、交通行业的主要OCR垂类应用。

- **2022.5.9 发布 PaddleOCR [release/2.5](https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.5)**
- 发布[PP-OCRv3](./doc/doc_ch/ppocr_introduction.md#pp-ocrv3),速度可比情况下,中文场景效果相比于PP-OCRv2再提升5%,英文场景提升11%,80语种多语言模型平均识别准确率提升5%以上;
- 发布半自动标注工具[PPOCRLabelv2](./PPOCRLabel):新增表格文字图像、图像关键信息抽取任务和不规则文字图像的标注功能;
Expand Down Expand Up @@ -220,11 +220,11 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185539517-ccf2372a-f026-4a7c-ad28-c741c770f60a.png" width="600">
</div>

<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
</div>

- RE(关系提取)
<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
Expand All @@ -237,7 +237,7 @@ PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力
<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094813-3a8e16cc-42e5-4982-b9f4-0134dfb5688d.png" width="600">
</div>

</details>

<a name="许可证书"></a>
Expand Down
17 changes: 8 additions & 9 deletions ppstructure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ English | [简体中文](README_ch.md)

PP-Structure is an intelligent document analysis system developed by the PaddleOCR team, which aims to help developers better complete tasks related to document understanding such as layout analysis and table recognition.

The pipeline of PP-Structurev2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.
The pipeline of PP-StructureV2 system is shown below. The document image first passes through the image direction correction module to identify the direction of the entire image and complete the direction correction. Then, two tasks of layout information analysis and key information extraction can be completed.

- In the layout analysis task, the image first goes through the layout analysis model to divide the image into different areas such as text, table, and figure, and then analyze these areas separately. For example, the table area is sent to the form recognition module for structured recognition, and the text area is sent to the OCR engine for text recognition. Finally, the layout recovery module restores it to a word or pdf file with the same layout as the original image;
- In the key information extraction task, the OCR engine is first used to extract the text content, and then the SER(semantic entity recognition) module obtains the semantic entities in the image, and finally the RE(relationship extraction) module obtains the correspondence between the semantic entities, thereby extracting the required key information.
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>
<img src="https://user-images.githubusercontent.com/14270174/195265734-6f4b5a7f-59b1-4fcc-af6d-89afc9bd51e1.jpg" width="100%"/>

More technical details: 👉 [PP-Structurev2 Technical Report](docs/PP-Structurev2_introduction.md)
More technical details: 👉 [PP-StructureV2 Technical Report](https://arxiv.org/abs/2210.05391)

PP-Structurev2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:
PP-StructureV2 supports independent use or flexible collocation of each module. For example, you can use layout analysis alone or table recognition alone. Click the corresponding link below to get the tutorial for each independent module:

- [Layout Analysis](layout/README.md)
- [Table Recognition](table/README.md)
Expand All @@ -32,7 +32,7 @@ PP-Structurev2 supports independent use or flexible collocation of each module.

## 2. Features

The main features of PP-Structurev2 are as follows:
The main features of PP-StructureV2 are as follows:
- Support layout analysis of documents in the form of images/pdfs, which can be divided into areas such as **text, titles, tables, figures, formulas, etc.**;
- Support common Chinese and English **table detection** tasks;
- Support structured table recognition, and output the final result to **Excel file**;
Expand All @@ -43,7 +43,7 @@ The main features of PP-Structurev2 are as follows:

## 3. Results

PP-Structurev2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.
PP-StructureV2 supports the independent use or flexible collocation of each module. For example, layout analysis can be used alone, or table recognition can be used alone. Only the visualization effects of several representative usage methods are shown here.

### 3.1 Layout analysis and table recognition

Expand All @@ -59,7 +59,7 @@ The following figure shows the effect of layout recovery based on the results of

* SER

Different colored boxes in the figure represent different categories.
Different colored boxes in the figure represent different categories.

<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186094456-01a1dd11-1433-4437-9ab2-6480ac94ec0a.png" width="600">
Expand Down Expand Up @@ -91,7 +91,7 @@ In the figure, the red box represents `Question`, the blue box represents `Answe

<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div>
</div>

<div align="center">
<img src="https://user-images.githubusercontent.com/14270174/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb.jpg" width="600">
Expand All @@ -114,4 +114,3 @@ For structural analysis related model downloads, please refer to:

For OCR related model downloads, please refer to:
- [PP-OCR Model Zoo](../doc/doc_en/models_list_en.md)

16 changes: 8 additions & 8 deletions ppstructure/README_ch.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,15 @@

PP-Structure是PaddleOCR团队自研的智能文档分析系统,旨在帮助开发者更好的完成版面分析、表格识别等文档理解相关任务。

PP-Structurev2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。
PP-StructureV2系统流程图如下所示,文档图像首先经过图像矫正模块,判断整图方向并完成转正,随后可以完成版面信息分析与关键信息抽取2类任务。
- 版面分析任务中,图像首先经过版面分析模型,将图像划分为文本、表格、图像等不同区域,随后对这些区域分别进行识别,如,将表格区域送入表格识别模块进行结构化识别,将文本区域送入OCR引擎进行文字识别,最后使用版面恢复模块将其恢复为与原始图像布局一致的word或者pdf格式的文件;
- 关键信息抽取任务中,首先使用OCR引擎提取文本内容,然后由语义实体识别模块获取图像中的语义实体,最后经关系抽取模块获取语义实体之间的对应关系,从而提取需要的关键信息。
<img src="./docs/ppstructurev2_pipeline.png" width="100%"/>

更多技术细节:👉 [PP-Structurev2技术报告](docs/PP-Structurev2_introduction.md)
<img src="https://user-images.githubusercontent.com/14270174/195265734-6f4b5a7f-59b1-4fcc-af6d-89afc9bd51e1.jpg" width="100%"/>

PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程:
更多技术细节:👉 PP-StructureV2技术报告 [中文版](docs/PP-StructureV2_introduction.md)[英文版](https://arxiv.org/abs/2210.05391)

PP-StructureV2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,点击下面相应链接获取各个独立模块的使用教程:

- [版面分析](layout/README_ch.md)
- [表格识别](table/README_ch.md)
Expand All @@ -33,7 +34,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独
<a name="2"></a>
## 2. 特性

PP-Structurev2的主要特性如下
PP-StructureV2的主要特性如下
- 支持对图片/pdf形式的文档进行版面分析,可以划分**文字、标题、表格、图片、公式等**区域;
- 支持通用的中英文**表格检测**任务;
- 支持表格区域进行结构化识别,最终结果输出**Excel文件**
Expand All @@ -44,7 +45,7 @@ PP-Structurev2的主要特性如下:

<a name="3"></a>
## 3. 效果展示
PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,这里仅展示几种代表性使用方式的可视化效果。
PP-StructureV2支持各个模块独立使用或灵活搭配,如,可以单独使用版面分析,或单独使用表格识别,这里仅展示几种代表性使用方式的可视化效果。

<a name="31"></a>
### 3.1 版面分析和表格识别
Expand Down Expand Up @@ -102,7 +103,7 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独

<div align="center">
<img src="https://user-images.githubusercontent.com/25809855/186095641-5843b4da-34d7-4c1c-943a-b1036a859fe3.png" width="600">
</div>
</div>

<a name="4"></a>
## 4. 快速体验
Expand All @@ -119,4 +120,3 @@ PP-Structurev2支持各个模块独立使用或灵活搭配,如,可以单独

OCR相关模型下载可以参考:
- [PP-OCR 模型库](../doc/doc_ch/models_list.md)

Loading

0 comments on commit 55bbc75

Please sign in to comment.