You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I have a suggestion to make on your implementation of the KIE (both for SER and RE tasks) algorithms: as the max token length is of 512 (because of BERT), and there are numerous cases in which people would like to train and infer on document over 512 tokens long, you could implement a sliding window strategy whereby the input image (for training or inference) is sliced into chunks < 512 tokens (with some overlap between the chunks), before being fed to the network. For inference, once the chunks have been fed to the network, they are then merged again to provide the user with the results on the full image. I haven't seen any implementation of this in PaddleOCR, but PaddleSeg seems to have this feature.
I really think it could go a long way towards increasing the number of use cases for KIE with Paddle OCR.
Otheriwse, do youy know of any other way to process documents >512 tokens?
The text was updated successfully, but these errors were encountered:
Hello, I have a suggestion to make on your implementation of the KIE (both for SER and RE tasks) algorithms: as the max token length is of 512 (because of BERT), and there are numerous cases in which people would like to train and infer on document over 512 tokens long, you could implement a sliding window strategy whereby the input image (for training or inference) is sliced into chunks < 512 tokens (with some overlap between the chunks), before being fed to the network. For inference, once the chunks have been fed to the network, they are then merged again to provide the user with the results on the full image. I haven't seen any implementation of this in PaddleOCR, but PaddleSeg seems to have this feature.
I really think it could go a long way towards increasing the number of use cases for KIE with Paddle OCR.
Otheriwse, do youy know of any other way to process documents >512 tokens?
The text was updated successfully, but these errors were encountered: