Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Layout Analysis #11

Closed
alejandrojcastaneira opened this issue Jul 5, 2022 · 2 comments
Closed

Layout Analysis #11

alejandrojcastaneira opened this issue Jul 5, 2022 · 2 comments

Comments

@alejandrojcastaneira
Copy link

First of all thanks, for the great work!
My question, could these models be adapted to the task of Layout Analysis, so that we could use them in datasets like Publaynet?
In this sense, the models would need to output the probabilities of each pixel belonging to a given class, instead of the possible tags for each token.

@jpWang
Copy link
Owner

jpWang commented Jul 6, 2022

Hi,
since LiLT does not introduce image information yet, it cannot directly output the probabilities of each pixel. However, maybe you can first use the OCR engine to get the OCR result and then classify each token into the categories of pixels contained in its corresponding box. In this way, it can assist the traditional visual model in dealing with layout analysis tasks.

@jpWang jpWang closed this as completed Oct 24, 2022
@mllife
Copy link

mllife commented Feb 14, 2024

Anyone has any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants