Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add preprocessing common to OCR tasks #10217

Merged

Conversation

UserUnknownFactor
Copy link
Contributor

Common OCR tasks often include filling transparent areas with actual color, inverting image or its binarization. This commits adds those as optional parameters for ocr function.

@paddle-bot
Copy link

paddle-bot bot commented Jun 20, 2023

Thanks for your contribution!

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution is solid but I feel it is a bit redundant to expose all the image processing params in PaddleOCR.ocr,do we have a better solution to

paddleocr.py Outdated Show resolved Hide resolved
@UserUnknownFactor
Copy link
Contributor Author

@shiyutang How about this?

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your edit is great, but there is one thing I want to add;

Args is passed into PaddleOCR in L662, therefore the preprocess args is already in the engine and can be accessed through self.params.binarize. This can avoid directly passing it into the engine. ocr.

engine = PaddleOCR(**(args.__dict__))

@UserUnknownFactor
Copy link
Contributor Author

Args is passed into PaddleOCR in L662, therefore the preprocess args is already in the engine and can be accessed

But what if we want to use those options through the API and not from the console application parameters? Won't this make things difficult because we'll need to reconfigure engine parameters then?

@shiyutang
Copy link
Collaborator

In the above way, if we need to use image preprocess options through API, we can directly pass the params into PaddleOCR.

PaddleOCR(bin=True,..)

Copy link
Collaborator

@shiyutang shiyutang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UserUnknownFactor
Copy link
Contributor Author

PaddleOCR(bin=True,..)

It will prevent us from changing the settings on per-file basis, and it is needed sometimes.
So I think the current implementation is the best compromise.

doc_en/inference_args.md

This doesn't exist. Do you want me to create it? I don't know Chinese, though...

Add preprocessing to options
@shiyutang
Copy link
Collaborator

@UserUnknownFactor
Copy link
Contributor Author

@shiyutang: Done.

@shiyutang shiyutang merged commit 8967e63 into PaddlePaddle:release/2.6 Jul 20, 2023
1 check passed
@UserUnknownFactor
Copy link
Contributor Author

@shiyutang: can you please cherrypick GH-10217 and GH-10216 to PaddlePaddle:dygraph and PaddlePaddle:release/2.7 if possible?

@shiyutang
Copy link
Collaborator

Do you have any problem doing that? I can review for you~

shiyutang pushed a commit that referenced this pull request Aug 16, 2023
* Don't break overall processing on a bad image

* Add preprocessing common to OCR tasks
Add preprocessing to options
@UserUnknownFactor
Copy link
Contributor Author

@shiyutang: No problem: GH-10654, GH-10655.
By the way can you please explain why is 2.7 version not based off 2.6? Is there some different approach to versioning?

@shiyutang
Copy link
Collaborator

2.7 is the snapshot of the dygraph branch, because we added lots of bugfix and new features on dygraph, it is easy to checkout a new branch on it.

shiyutang pushed a commit that referenced this pull request Aug 21, 2023
* Don't break overall processing on a bad image

* Add preprocessing common to OCR tasks
Add preprocessing to options
tink2123 added a commit that referenced this pull request Sep 7, 2023
* Update PP-OCRv4_introduction.md

* Update PP-OCRv4_introduction.md (#10616)

* Update PP-OCRv4_introduction.md

* Update PP-OCRv4_introduction.md

* Update PP-OCRv4_introduction.md

* Update README.md

* Cherrypicking GH-10217 and GH-10216 to PaddlePaddle:Release/2.7 (#10655)

* Don't break overall processing on a bad image

* Add preprocessing common to OCR tasks
Add preprocessing to options

* Update requirements.txt (#10656)

added missing pyyaml library

* [TIPC]update xpu tipc script (#10658)

* fix-typo (#10642)

Co-authored-by: Dennis <[email protected]>
Co-authored-by: shiyutang <[email protected]>

* 修改数据增强导致的DSR报错 (#10662) (#10681)

* 修改数据增强导致的DSR报错

* 错误修改回滚

* Update algorithm_overview_en.md (#10670)

Fixed simple spelling errors.

* Implement recoginition method ParseQ

* Document update for new recognition method ParseQ

* add prediction for parseq

* Update rec_vit_parseq.yml

* Update rec_r31_sar.yml

* Update rec_r31_sar.yml

* Update rec_r50_fpn_srn.yml

* Update rec_vit_parseq.py

* Update rec_vit_parseq.yml

* Update rec_parseq_head.py

* Update rec_img_aug.py

* Update rec_vit_parseq.yml

* Update __init__.py

* Update predict_rec.py

* Update paddleocr.py

* Update requirements.txt

* Update utility.py

* Update utility.py

---------

Co-authored-by: xiaoting <[email protected]>
Co-authored-by: topduke <[email protected]>
Co-authored-by: dyning <[email protected]>
Co-authored-by: UserUnknownFactor <[email protected]>
Co-authored-by: itasli <[email protected]>
Co-authored-by: Kai Song <[email protected]>
Co-authored-by: dvorst <[email protected]>
Co-authored-by: Dennis <[email protected]>
Co-authored-by: shiyutang <[email protected]>
Co-authored-by: Dec20B <[email protected]>
Co-authored-by: ncoffman <[email protected]>
shiyutang pushed a commit that referenced this pull request Oct 16, 2023
shiyutang added a commit that referenced this pull request Oct 18, 2023
* Update recognition_en.md (#10059)

ic15_dict.txt only have 36 digits

* Update ocr_rec.h (#9469)

It is enough to include preprocess_op.h, we do not need to include ocr_cls.h.

* 补充num_classes注释说明 (#10073)

ser_vi_layoutxlm_xfund_zh.yml中的Architecture.Backbone.num_classes所赋值会设置给Loss.num_classes,
由于采用BIO标注,假设字典中包含n个字段(包含other)时,则类别数为2n-1;假设字典中包含n个字段(不含other)时,则类别数为2n+1。

* Update algorithm_overview_en.md (#9747)

Fix links to super-resolution algorithm docs

* 改进文档`deploy/hubserving/readme.md`和`doc/doc_ch/models_list.md` (#9110)

* Update readme.md

* Update readme.md

* Update readme.md

* Update models_list.md

* trim trailling spaces @ `deploy/hubserving/readme_en.md`

* `s/shell/bash/` @ `deploy/hubserving/readme_en.md`

* Update `deploy/hubserving/readme_en.md` to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update deploy/hubserving/readme_en.md to sync with `deploy/hubserving/readme.md`

* Update `doc/doc_en/models_list_en.md` to sync with `doc/doc_ch/models_list_en.md`

* using Grammarly to weak `deploy/hubserving/readme_en.md`

* using Grammarly to tweak `doc/doc_en/models_list_en.md`

* `ocr_system` module will return with values of field `confidence`

* Update README_CN.md

* 修复测试服务中图片转Base64的引用地址错误。 (#8334)

* Update application.md

* [Doc] Fix 404 link.  (#10318)

* Update PP-OCRv3_det_train.md

* Update knowledge_distillation.md

* Update config.md

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file (#10181)

* Fix fitz camelCase deprecation and .PDF not being recognized as pdf file

* refactor get_image_file_list function

* Update customize.md (#10325)

* Update FAQ.md (#10345)

* Update FAQ.md (#10349)

* Don't break overall processing on a bad image (#10216)

* Add preprocessing common to OCR tasks (#10217)

Add preprocessing to options

* [MLU] add mlu device for infer (#10249)

* Create newfeature.md

* Update newfeature.md

* remove unused imported module, so can avoid PyInstaller packaged binary's start-time not found module error. (#10502)

* CV套件建设专项活动 - 文字识别返回单字识别坐标 (#10515)

* modification of return word box

* update_implements

* Update rec_postprocess.py

* Update utility.py

* Update README_ch.md

* revert README_ch.md update

* Fixed Layout recovery README file (#10493)

Co-authored-by: Shubham Chambhare <[email protected]>

* update_doc

* bugfix

---------

Co-authored-by: ChuongLoc <[email protected]>
Co-authored-by: Wang Xin <[email protected]>
Co-authored-by: tanjh <[email protected]>
Co-authored-by: Louis Maddox <[email protected]>
Co-authored-by: n0099 <[email protected]>
Co-authored-by: zhenliang li <[email protected]>
Co-authored-by: itasli <[email protected]>
Co-authored-by: UserUnknownFactor <[email protected]>
Co-authored-by: PeiyuLau <[email protected]>
Co-authored-by: kerneltravel <[email protected]>
Co-authored-by: ToddBear <[email protected]>
Co-authored-by: Ligoml <[email protected]>
Co-authored-by: Shubham Chambhare <[email protected]>
Co-authored-by: Shubham Chambhare <[email protected]>
Co-authored-by: andyj <[email protected]>
jzhang533 pushed a commit to jzhang533/PaddleOCR that referenced this pull request Mar 28, 2024
…Paddle:Release/2.7 (PaddlePaddle#10655)

* Don't break overall processing on a bad image

* Add preprocessing common to OCR tasks
Add preprocessing to options
@UserUnknownFactor UserUnknownFactor deleted the more_preprocess branch May 8, 2024 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants