GitHub - thunlp/LegalPLMs: Source code and checkpoints for legal pre-trained language models.

Lawformer

Introduction

This repository provides the source code and checkpoints of the paper "Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents". You can download the checkpoint of Lawformer from the huggingface model hub or from here. Besides, the checkpoint of our baseline model, Legal RoBERTa, can be downloaded from here.

The new judgement prediction dataset, CAIL-Long, can be downloaded from here.

Installation

pip install -r requirements.txt

Easy Start

We have uploaded our model to the huggingface model hub. Make sure you have installed transformers.

>>> from transformers import AutoModel, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-roberta-wwm-ext")
>>> model = AutoModel.from_pretrained("thunlp/Lawformer")
>>> inputs = tokenizer("任某提起诉讼，请求判令解除婚姻关系并对夫妻共同财产进行分割。", return_tensors="pt")
>>> outputs = model(**inputs)

Pre-training

We pre-train Lawformer continuously from hfl/chinese-roberta-wwm-ext. Therefore, we first convert the RoBERTa model to the Longformer by running the following command:

python3 convert_roberta_lfm.py

Then run the following command to pre-train the model:

python3 -m torch.distributed.launch --master_port 10086 --nproc_per_node 8 train.py -c config/Lawformer.config -g 0,1,2,3,4,5,6,7

Cite

If you use the pre-trained models, please cite this paper:

@article{xiao2021lawformer,
  title={Lawformer: A Pre-trained Language Model forChinese Legal Long Documents},
  author={Xiao, Chaojun and Hu, Xueyu and Liu, Zhiyuan and Tu, Cunchao and Sun, Maosong},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
PLMConfig		PLMConfig
config		config
config_parser		config_parser
dataset		dataset
formatter		formatter
model		model
reader		reader
tools		tools
.gitignore		.gitignore
ReadMe.md		ReadMe.md
convert_roberta_lfm.py		convert_roberta_lfm.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lawformer

Introduction

Installation

Easy Start

Pre-training

Cite

About

Releases

Packages

Languages

thunlp/LegalPLMs

Folders and files

Latest commit

History

Repository files navigation

Lawformer

Introduction

Installation

Easy Start

Pre-training

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages