Add preprocessing common to OCR tasks #10217

UserUnknownFactor · 2023-06-20T09:20:53Z

Common OCR tasks often include filling transparent areas with actual color, inverting image or its binarization. This commits adds those as optional parameters for ocr function.

paddle-bot · 2023-06-20T09:20:57Z

Thanks for your contribution!

shiyutang

The solution is solid but I feel it is a bit redundant to expose all the image processing params in PaddleOCR.ocr，do we have a better solution to

paddleocr.py

UserUnknownFactor · 2023-07-17T06:37:42Z

@shiyutang How about this?

shiyutang

I think your edit is great, but there is one thing I want to add;

Args is passed into PaddleOCR in L662, therefore the preprocess args is already in the engine and can be accessed through self.params.binarize. This can avoid directly passing it into the engine. ocr.

engine = PaddleOCR(**(args.__dict__))

UserUnknownFactor · 2023-07-18T12:57:24Z

Args is passed into PaddleOCR in L662, therefore the preprocess args is already in the engine and can be accessed

But what if we want to use those options through the API and not from the console application parameters? Won't this make things difficult because we'll need to reconfigure engine parameters then?

shiyutang · 2023-07-19T07:45:09Z

In the above way, if we need to use image preprocess options through API, we can directly pass the params into PaddleOCR.

PaddleOCR(bin=True,..)

shiyutang

On top of code change, we may also need to update the docs. https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_ch/inference_args.md
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/doc/doc_en/inference_args.md

UserUnknownFactor · 2023-07-19T09:49:39Z

PaddleOCR(bin=True,..)

It will prevent us from changing the settings on per-file basis, and it is needed sometimes.
So I think the current implementation is the best compromise.

doc_en/inference_args.md

This doesn't exist. Do you want me to create it? I don't know Chinese, though...

Add preprocessing to options

shiyutang · 2023-07-20T07:51:05Z

OK~ you can add here:https://github.com/PaddlePaddle/PaddleOCR/blob/dygraph/doc/doc_en/inference_args_en.md

UserUnknownFactor · 2023-07-20T11:02:33Z

@shiyutang: Done.

UserUnknownFactor · 2023-08-16T09:41:45Z

@shiyutang: can you please cherrypick GH-10217 and GH-10216 to PaddlePaddle:dygraph and PaddlePaddle:release/2.7 if possible?

shiyutang · 2023-08-16T09:55:12Z

Do you have any problem doing that? I can review for you~

* Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options

UserUnknownFactor · 2023-08-16T10:55:29Z

@shiyutang: No problem: GH-10654, GH-10655.
By the way can you please explain why is 2.7 version not based off 2.6? Is there some different approach to versioning?

shiyutang · 2023-08-16T11:48:35Z

2.7 is the snapshot of the dygraph branch， because we added lots of bugfix and new features on dygraph, it is easy to checkout a new branch on it.

* Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options

* Update PP-OCRv4_introduction.md * Update PP-OCRv4_introduction.md (#10616) * Update PP-OCRv4_introduction.md * Update PP-OCRv4_introduction.md * Update PP-OCRv4_introduction.md * Update README.md * Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:Release/2.7 (#10655) * Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options * Update requirements.txt (#10656) added missing pyyaml library * [TIPC]update xpu tipc script (#10658) * fix-typo (#10642) Co-authored-by: Dennis <[email protected]> Co-authored-by: shiyutang <[email protected]> * 修改数据增强导致的DSR报错 (#10662) (#10681) * 修改数据增强导致的DSR报错 * 错误修改回滚 * Update algorithm_overview_en.md (#10670) Fixed simple spelling errors. * Implement recoginition method ParseQ * Document update for new recognition method ParseQ * add prediction for parseq * Update rec_vit_parseq.yml * Update rec_r31_sar.yml * Update rec_r31_sar.yml * Update rec_r50_fpn_srn.yml * Update rec_vit_parseq.py * Update rec_vit_parseq.yml * Update rec_parseq_head.py * Update rec_img_aug.py * Update rec_vit_parseq.yml * Update __init__.py * Update predict_rec.py * Update paddleocr.py * Update requirements.txt * Update utility.py * Update utility.py --------- Co-authored-by: xiaoting <[email protected]> Co-authored-by: topduke <[email protected]> Co-authored-by: dyning <[email protected]> Co-authored-by: UserUnknownFactor <[email protected]> Co-authored-by: itasli <[email protected]> Co-authored-by: Kai Song <[email protected]> Co-authored-by: dvorst <[email protected]> Co-authored-by: Dennis <[email protected]> Co-authored-by: shiyutang <[email protected]> Co-authored-by: Dec20B <[email protected]> Co-authored-by: ncoffman <[email protected]>

Add preprocessing to options

* Update recognition_en.md (#10059) ic15_dict.txt only have 36 digits * Update ocr_rec.h (#9469) It is enough to include preprocess_op.h, we do not need to include ocr_cls.h. * 补充num_classes注释说明 (#10073) ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes，由于采用BIO标注，假设字典中包含n个字段（包含other）时，则类别数为2n-1;假设字典中包含n个字段（不含other）时，则类别数为2n+1。 * Update algorithm_overview_en.md (#9747) Fix links to super-resolution algorithm docs * 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110) * Update readme.md * Update readme.md * Update readme.md * Update models_list.md * trim trailling spaces @ `deploy/hubserving/readme_en.md` * `s/shell/bash/` @ `deploy/hubserving/readme_en.md` * Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md` * Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md` * using Grammarly to weak `deploy/hubserving/readme_en.md` * using Grammarly to tweak `doc/doc_en/models_list_en.md` * `ocr_system` module will return with values of field `confidence` * Update README_CN.md * 修复测试服务中图片转Base64的引用地址错误。 (#8334) * Update application.md * [Doc] Fix 404 link. (#10318) * Update PP-OCRv3_det_train.md * Update knowledge_distillation.md * Update config.md * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181) * Fix fitz camelCase deprecation and .PDF not being recognized as pdf file * refactor get_image_file_list function * Update customize.md (#10325) * Update FAQ.md (#10345) * Update FAQ.md (#10349) * Don't break overall processing on a bad image (#10216) * Add preprocessing common to OCR tasks (#10217) Add preprocessing to options * [MLU] add mlu device for infer (#10249) * Create newfeature.md * Update newfeature.md * remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502) * CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515) * modification of return word box * update_implements * Update rec_postprocess.py * Update utility.py * Update README_ch.md * revert README_ch.md update * Fixed Layout recovery README file (#10493) Co-authored-by: Shubham Chambhare <[email protected]> * update_doc * bugfix --------- Co-authored-by: ChuongLoc <[email protected]> Co-authored-by: Wang Xin <[email protected]> Co-authored-by: tanjh <[email protected]> Co-authored-by: Louis Maddox <[email protected]> Co-authored-by: n0099 <[email protected]> Co-authored-by: zhenliang li <[email protected]> Co-authored-by: itasli <[email protected]> Co-authored-by: UserUnknownFactor <[email protected]> Co-authored-by: PeiyuLau <[email protected]> Co-authored-by: kerneltravel <[email protected]> Co-authored-by: ToddBear <[email protected]> Co-authored-by: Ligoml <[email protected]> Co-authored-by: Shubham Chambhare <[email protected]> Co-authored-by: Shubham Chambhare <[email protected]> Co-authored-by: andyj <[email protected]>

…Paddle:Release/2.7 (PaddlePaddle#10655) * Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options

paddle-bot bot added contributor status: proposed labels Jun 20, 2023

UserUnknownFactor force-pushed the more_preprocess branch from 2709d00 to c13ef4e Compare June 21, 2023 11:23

LDOUBLEV approved these changes Jun 28, 2023

View reviewed changes

shiyutang assigned LDOUBLEV Jun 30, 2023

shiyutang reviewed Jul 14, 2023

View reviewed changes

paddleocr.py Outdated Show resolved Hide resolved

shiyutang assigned shiyutang and UserUnknownFactor Jul 14, 2023

UserUnknownFactor force-pushed the more_preprocess branch 3 times, most recently from 9086626 to fd577a7 Compare July 17, 2023 07:48

UserUnknownFactor requested a review from shiyutang July 17, 2023 08:58

shiyutang reviewed Jul 18, 2023

View reviewed changes

shiyutang reviewed Jul 19, 2023

View reviewed changes

UserUnknownFactor force-pushed the more_preprocess branch from fd577a7 to 118364d Compare July 19, 2023 10:08

Add preprocessing common to OCR tasks

92244fc

Add preprocessing to options

UserUnknownFactor force-pushed the more_preprocess branch from 118364d to 92244fc Compare July 19, 2023 10:29

UserUnknownFactor mentioned this pull request Jul 20, 2023

OCR preprocessing parameters documentation #10447

Merged

shiyutang merged commit 8967e63 into PaddlePaddle:release/2.6 Jul 20, 2023
1 check passed

shiyutang added the Contributor PR is merged label Jul 20, 2023

paddle-bot bot removed the status: proposed label Jul 20, 2023

shiyutang pushed a commit that referenced this pull request Aug 16, 2023

Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:Release/2.7 (#10655)

b17c2f3

* Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options

shiyutang pushed a commit that referenced this pull request Aug 21, 2023

Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:dygraph (#10654)

b3912fc

* Don't break overall processing on a bad image * Add preprocessing common to OCR tasks Add preprocessing to options

shiyutang pushed a commit that referenced this pull request Oct 16, 2023

Add preprocessing common to OCR tasks (#10217)

08e1a0c

Add preprocessing to options

UserUnknownFactor deleted the more_preprocess branch May 8, 2024 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add preprocessing common to OCR tasks #10217

Add preprocessing common to OCR tasks #10217

UserUnknownFactor commented Jun 20, 2023

paddle-bot bot commented Jun 20, 2023

shiyutang left a comment

UserUnknownFactor commented Jul 17, 2023

shiyutang left a comment

UserUnknownFactor commented Jul 18, 2023

shiyutang commented Jul 19, 2023

shiyutang left a comment

UserUnknownFactor commented Jul 19, 2023

shiyutang commented Jul 20, 2023

UserUnknownFactor commented Jul 20, 2023

UserUnknownFactor commented Aug 16, 2023

shiyutang commented Aug 16, 2023

UserUnknownFactor commented Aug 16, 2023

shiyutang commented Aug 16, 2023

Add preprocessing common to OCR tasks #10217

Add preprocessing common to OCR tasks #10217

Conversation

UserUnknownFactor commented Jun 20, 2023

paddle-bot bot commented Jun 20, 2023

shiyutang left a comment

Choose a reason for hiding this comment

UserUnknownFactor commented Jul 17, 2023

shiyutang left a comment

Choose a reason for hiding this comment

UserUnknownFactor commented Jul 18, 2023

shiyutang commented Jul 19, 2023

shiyutang left a comment

Choose a reason for hiding this comment

UserUnknownFactor commented Jul 19, 2023

shiyutang commented Jul 20, 2023

UserUnknownFactor commented Jul 20, 2023

UserUnknownFactor commented Aug 16, 2023

shiyutang commented Aug 16, 2023

UserUnknownFactor commented Aug 16, 2023

shiyutang commented Aug 16, 2023