KIE suggestion: implement a sliding window #10404

lamaeldo · 2023-07-16T10:34:35Z

Hello, I have a suggestion to make on your implementation of the KIE (both for SER and RE tasks) algorithms: as the max token length is of 512 (because of BERT), and there are numerous cases in which people would like to train and infer on document over 512 tokens long, you could implement a sliding window strategy whereby the input image (for training or inference) is sliced into chunks < 512 tokens (with some overlap between the chunks), before being fed to the network. For inference, once the chunks have been fed to the network, they are then merged again to provide the user with the results on the full image. I haven't seen any implementation of this in PaddleOCR, but PaddleSeg seems to have this feature.
I really think it could go a long way towards increasing the number of use cases for KIE with Paddle OCR.
Otheriwse, do youy know of any other way to process documents >512 tokens?

ToddBear · 2023-07-17T09:53:08Z

Thanks for your suggestion !

We have added this suggestion in our activity #10334

Welcome to participate in our activity ！

paddle-bot bot assigned andyjiang1116 Jul 16, 2023

ToddBear added the Code PR is needed This issue could inspire a code PR label Jul 17, 2023

ToddBear mentioned this issue Jul 17, 2023

新增需求征集（Collect Feature Request） #10334

Closed

lamaeldo closed this as completed Jul 18, 2023

paddle-bot bot added the status/close label Jul 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KIE suggestion: implement a sliding window #10404

KIE suggestion: implement a sliding window #10404

lamaeldo commented Jul 16, 2023

ToddBear commented Jul 17, 2023

KIE suggestion: implement a sliding window #10404

KIE suggestion: implement a sliding window #10404

Comments

lamaeldo commented Jul 16, 2023

ToddBear commented Jul 17, 2023