What's New

[2022/10] LiLT has been added to 🤗huggingface/transformers in HERE.

[2022/03] Initial model and code release.

LiLT (ACL 2022)

This is the official PyTorch implementation of the ACL 2022 paper: "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding". [official] [arXiv]

LiLT is pre-trained on the visually-rich documents of a single language (English) and can be directly fine-tuned on other languages with the corresponding off-the-shelf monolingual/multilingual pre-trained textual models. We hope the public availability of this work can help document intelligence researches.

Installation

For CUDA 11.X:

conda create -n liltfinetune python=3.7
conda activate liltfinetune
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0 -c pytorch
python -m pip install detectron2==0.5 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu110/torch1.7/index.html
git clone https://github.com/jpWang/LiLT
cd LiLT
pip install -r requirements.txt
pip install -e .

Or check Detectron2/PyTorch versions and modify the command lines accordingly.

Datasets

In this repository, we provide the fine-tuning codes for FUNSD and XFUND.

You can download our pre-processed data (~1.2GB) from HERE, and put the unzipped xfund&funsd/ under LiLT/.

Available Checkpoints

Model	Language	Size	Download
`lilt-roberta-en-base`	EN	293MB	OneDrive
`lilt-infoxlm-base`	MUL	846MB	OneDrive
`lilt-only-base`	None	21MB	OneDrive

Or Generate Your Own Checkpoint (Optional)

If you want to combine the pre-trained LiLT with other language's RoBERTa, please download lilt-only-base and use gen_weight_roberta_like.py to generate your own pre-trained checkpoint.

For example, combine lilt-only-base with English roberta-base:

mkdir roberta-en-base
wget https://huggingface.co/roberta-base/resolve/main/config.json -O roberta-en-base/config.json
wget https://huggingface.co/roberta-base/resolve/main/pytorch_model.bin -O roberta-en-base/pytorch_model.bin
python gen_weight_roberta_like.py \
     --lilt lilt-only-base/pytorch_model.bin \
     --text roberta-en-base/pytorch_model.bin \
     --config roberta-en-base/config.json \
     --out lilt-roberta-en-base

Or combine lilt-only-base with microsoft/infoxlm-base:

mkdir infoxlm-base
wget https://huggingface.co/microsoft/infoxlm-base/resolve/main/config.json -O infoxlm-base/config.json
wget https://huggingface.co/microsoft/infoxlm-base/resolve/main/pytorch_model.bin -O infoxlm-base/pytorch_model.bin
python gen_weight_roberta_like.py \
     --lilt lilt-only-base/pytorch_model.bin \
     --text infoxlm-base/pytorch_model.bin \
     --config infoxlm-base/config.json \
     --out lilt-infoxlm-base

Fine-tuning

Semantic Entity Recognition on FUNSD

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_funsd.py \
        --model_name_or_path lilt-roberta-en-base \
        --tokenizer_name roberta-base \
        --output_dir ser_funsd_lilt-roberta-en-base \
        --do_train \
        --do_predict \
        --max_steps 2000 \
        --per_device_train_batch_size 8 \
        --warmup_ratio 0.1 \
        --fp16

Language-specific (For example, ZH) Semantic Entity Recognition on XFUND

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_ser.py \
        --model_name_or_path lilt-infoxlm-base \
        --tokenizer_name xlm-roberta-base \
        --output_dir ls_ser_xfund_zh_lilt-infoxlm-base \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 2000 \
        --per_device_train_batch_size 16 \
        --warmup_ratio 0.1 \
        --fp16

Language-specific (For example, ZH) Relation Extraction on XFUND

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py \
        --model_name_or_path lilt-infoxlm-base \
        --tokenizer_name xlm-roberta-base \
        --output_dir ls_re_xfund_zh_lilt-infoxlm-base \
        --do_train \
        --do_eval \
        --lang zh \
        --max_steps 5000 \
        --per_device_train_batch_size 8 \
        --learning_rate  6.25e-6 \
        --warmup_ratio 0.1 \
        --fp16

Multi-task Semantic Entity Recognition on XFUND

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_ser.py \
        --model_name_or_path lilt-infoxlm-base \
        --tokenizer_name xlm-roberta-base \
        --output_dir mt_ser_xfund_all_lilt-infoxlm-base \
        --do_train \
        --additional_langs all \
        --max_steps 16000 \
        --per_device_train_batch_size 16 \
        --warmup_ratio 0.1 \
        --fp16

Multi-task Relation Extraction on XFUND

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 examples/run_xfun_re.py \
        --model_name_or_path lilt-infoxlm-base \
        --tokenizer_name xlm-roberta-base \
        --output_dir mt_re_xfund_all_lilt-infoxlm-base \
        --do_train \
        --additional_langs all \
        --max_steps 40000 \
        --per_device_train_batch_size 8 \
        --learning_rate  6.25e-6 \
        --warmup_ratio 0.1 \
        --fp16

Results

Semantic Entity Recognition on FUNSD

Language-specific Fine-tuning on XFUND

Cross-lingual Zero-shot Transfer on XFUND

Multitask Fine-tuning on XFUND

Acknowledge

The repository benefits greatly from unilm/layoutlmft. Thanks a lot for their excellent work.

Citation

If our paper helps your research, please cite it in your publication(s):

@inproceedings{wang-etal-2022-lilt,
    title = "{L}i{LT}: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding",
    author={Wang, Jiapeng and Jin, Lianwen and Ding, Kai},
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = may,
    year = "2022",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.acl-long.534",
    doi = "10.18653/v1/2022.acl-long.534",
    pages = "7747--7757",
}

Feedback

Suggestions and discussions are greatly welcome. Please contact the authors by sending email to eejpwang@mail.scut.edu.cn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

What's New

LiLT (ACL 2022)

Installation

Datasets

Available Checkpoints

Or Generate Your Own Checkpoint (Optional)

Fine-tuning

Semantic Entity Recognition on FUNSD

Language-specific (For example, ZH) Semantic Entity Recognition on XFUND

Language-specific (For example, ZH) Relation Extraction on XFUND

Multi-task Semantic Entity Recognition on XFUND

Multi-task Relation Extraction on XFUND

Results

Semantic Entity Recognition on FUNSD

Language-specific Fine-tuning on XFUND

Cross-lingual Zero-shot Transfer on XFUND

Multitask Fine-tuning on XFUND

Acknowledge

Citation

Feedback

Files

README.md

Latest commit

History

README.md

File metadata and controls

What's New

LiLT (ACL 2022)

Installation

Datasets

Available Checkpoints

Or Generate Your Own Checkpoint (Optional)

Fine-tuning

Semantic Entity Recognition on FUNSD

Language-specific (For example, ZH) Semantic Entity Recognition on XFUND

Language-specific (For example, ZH) Relation Extraction on XFUND

Multi-task Semantic Entity Recognition on XFUND

Multi-task Relation Extraction on XFUND

Results

Semantic Entity Recognition on FUNSD

Language-specific Fine-tuning on XFUND

Cross-lingual Zero-shot Transfer on XFUND

Multitask Fine-tuning on XFUND

Acknowledge

Citation

Feedback