Layout Analysis #11

alejandrojcastaneira · 2022-07-05T12:35:45Z

First of all thanks, for the great work!
My question, could these models be adapted to the task of Layout Analysis, so that we could use them in datasets like Publaynet?
In this sense, the models would need to output the probabilities of each pixel belonging to a given class, instead of the possible tags for each token.

jpWang · 2022-07-06T01:51:50Z

Hi,
since LiLT does not introduce image information yet, it cannot directly output the probabilities of each pixel. However, maybe you can first use the OCR engine to get the OCR result and then classify each token into the categories of pixels contained in its corresponding box. In this way, it can assist the traditional visual model in dealing with layout analysis tasks.

mllife · 2024-02-14T09:34:42Z

Anyone has any update on this?

jpWang closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layout Analysis #11

Layout Analysis #11

alejandrojcastaneira commented Jul 5, 2022

jpWang commented Jul 6, 2022

mllife commented Feb 14, 2024

Layout Analysis #11

Layout Analysis #11

Comments

alejandrojcastaneira commented Jul 5, 2022

jpWang commented Jul 6, 2022

mllife commented Feb 14, 2024