Layoutlm not classifying bottom half of documents #935

DoubtfulCoder · 2022-11-30T03:33:05Z

Describe
Model I am using (UniLM, MiniLM, LayoutLM ...): Layoutlm

I am trying to use Layoutlm for resume parsing. I've labeled and trained on over 100 resumes and am currently reaching an F1 score of around 0.55 and accuracy around 85%. However, when I run inference, many of the documents have large portions (clustered at the bottom) of the text left unclassified. The resumes that I've run inference on are in a similar format to those trained on and should have similar locations of bounding boxes. Why is layoutlm not classifying them? If it's overfitting, what can I do about it?

Example (blurred for personal info):

wolfshow · 2022-11-30T04:55:28Z

@DoubtfulCoder, That's not overfitting. The reason is LayoutLM processes the document in a windows size of 512 tokens. If your document is longer than 512 tokens, you need to split the page into multiple samples for the model to process, both for training and testing.

DoubtfulCoder · 2022-12-01T00:50:29Z

Thanks for your help @wolfshow . By tokens, do you mean just words or do they refer to something else as well?

How can I handle this length limit in training and in inference? I see some things about sliding window approach by moving 128 characters at a time. Can you provide some example code?

davelza95 · 2022-12-07T20:34:45Z

I have a similar issue, I have tried to change the seq_max_length, but I got a Cuda error in training.

I have tried changing the max_position_embeddings = 1024, and the bboxes size (1024+196+1, 4) after I tokenized them, but this hasn't worked.

Note: Why 196 + 1

Can someone help me, please.

davelza95 · 2022-12-07T20:54:34Z

Thanks for your help @wolfshow . By tokens, do you mean just words or do they refer to something else as well?

How can I handle this length limit in training and in inference? I see some things about sliding window approach by moving 128 characters at a time. Can you provide some example code?

Hi! Did you fix it ?

DoubtfulCoder · 2022-12-10T22:19:00Z

Thanks for your help @wolfshow . By tokens, do you mean just words or do they refer to something else as well?
How can I handle this length limit in training and in inference? I see some things about sliding window approach by moving 128 characters at a time. Can you provide some example code?

Hi! Did you fix it ?

Hi, I did not try increasing max_position_embeddings but just used a sliding window approach. Basically, if the number of words is greater than 315, just slides windows of 100 characters (0-300, 100-400, etc.) and then aggregate the predictions.

lamaeldo mentioned this issue Jul 16, 2023

KIE suggestion: implement a sliding window PaddlePaddle/PaddleOCR#10404

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layoutlm not classifying bottom half of documents #935

Layoutlm not classifying bottom half of documents #935

DoubtfulCoder commented Nov 30, 2022

wolfshow commented Nov 30, 2022

DoubtfulCoder commented Dec 1, 2022 •

edited

Loading

davelza95 commented Dec 7, 2022 •

edited

Loading

davelza95 commented Dec 7, 2022

DoubtfulCoder commented Dec 10, 2022

Layoutlm not classifying bottom half of documents #935

Layoutlm not classifying bottom half of documents #935

Comments

DoubtfulCoder commented Nov 30, 2022

wolfshow commented Nov 30, 2022

DoubtfulCoder commented Dec 1, 2022 • edited Loading

davelza95 commented Dec 7, 2022 • edited Loading

davelza95 commented Dec 7, 2022

DoubtfulCoder commented Dec 10, 2022

DoubtfulCoder commented Dec 1, 2022 •

edited

Loading

davelza95 commented Dec 7, 2022 •

edited

Loading