Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KIE suggestion: implement a sliding window #10404

Closed
lamaeldo opened this issue Jul 16, 2023 · 1 comment
Closed

KIE suggestion: implement a sliding window #10404

lamaeldo opened this issue Jul 16, 2023 · 1 comment
Assignees
Labels
Code PR is needed This issue could inspire a code PR status/close

Comments

@lamaeldo
Copy link

Hello, I have a suggestion to make on your implementation of the KIE (both for SER and RE tasks) algorithms: as the max token length is of 512 (because of BERT), and there are numerous cases in which people would like to train and infer on document over 512 tokens long, you could implement a sliding window strategy whereby the input image (for training or inference) is sliced into chunks < 512 tokens (with some overlap between the chunks), before being fed to the network. For inference, once the chunks have been fed to the network, they are then merged again to provide the user with the results on the full image. I haven't seen any implementation of this in PaddleOCR, but PaddleSeg seems to have this feature.
I really think it could go a long way towards increasing the number of use cases for KIE with Paddle OCR.
Otheriwse, do youy know of any other way to process documents >512 tokens?

@ToddBear
Copy link
Collaborator

Thanks for your suggestion !

We have added this suggestion in our activity #10334

Welcome to participate in our activity !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code PR is needed This issue could inspire a code PR status/close
Projects
None yet
Development

No branches or pull requests

3 participants