layoutlm_CORD

Introduction

This repo is a implementation of the Layoutlm Model, see [1], from the sourcecode (as I didn't manage to make it work with the huggingface implementation : HuggingFace Implementation and benchmarked on the CORD Dataset, see [2].

Results

I compare the performance of the pre-train LayoutLM on IIT-CDIP dataset (version LARGE) with the Bert (version Large).

Validation Set

Model	F1_Score	Precision	Recall
LayoutLM Large	0.9562	0.9577	0.9546
Bert Large	0.9474	0.9466	0.9481

Test Set

Model	F1_Score	Precision	Recall
LayoutLM Large	0.9843	0.9845	0.9841
Bert Large	0.9859	0.9861	0.9856

In the validation set, Layoutlm outperformed Bert, but it is not the case in the test set. I need to do more investigation.
Nevertheless, it took Bert 11 minutes to finish the training (4 epochs) while Layoutlm needed only 3 minutes. (same environment, setup ..)

Important files

I am using the Layoutlm Large, files of the pre-trained model can be found on these links :
OneDrive / GoogleDrive
Other ressources can be found on the original repository : Official Layoutlm

##TODO I will soon put a script for the training, otherwise you can always check my notebooks.
I will also give more details about the dataset, the notebook's structure ...

References

[1] Yiheng Xu and Minghao Li and Lei Cui and Shaohan Huang and Furu Wei and Ming Zhou (2019) , LayoutLM: Pre-training of Text and Layout for Document Image Understanding (https://arxiv.org/abs/1912.13318), https://github.com/microsoft/unilm/tree/master/layoutlm

[2] Park, Seunghyun and Shin, Seung and Lee, Bado and Lee, Junyeop and Surh, Jaeheung and Seo, Minjoon and Lee, Hwalsuk (2019) CORD: A Consolidated Receipt Dataset for Post-OCR Parsing (Document Intelligence Workshop at Neural Information Processing Systems) https://github.com/clovaai/cord

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bertlarge.ipynb		bertlarge.ipynb
generate_dataset.ipynb		generate_dataset.ipynb
layoutlm.ipynb		layoutlm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

layoutlm_CORD

Introduction

Results

Validation Set

Test Set

Important files

References

About

Releases

Packages

Languages

License

omarsou/layoutlm_CORD

Folders and files

Latest commit

History

Repository files navigation

layoutlm_CORD

Introduction

Results

Validation Set

Test Set

Important files

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages