Skip to content
/ DictBERT Public

Author: Wenhao Yu ([email protected]). ACL 2022 Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Notifications You must be signed in to change notification settings

wyu97/DictBERT

Repository files navigation

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Introduction

-- This is the pytorch implementation of our ACL 2022 paper "Dict-BERT: Enhancing Language Model Pre-training with Dictionary" [PDF]. In this paper, we propose DictBERT, which is a novel pre-trained language model by leveraging rare word definitions in English dictionaries (e.g., Wiktionary). DictBERT is based on the BERT architecture, trained under the same setting as BERT. Please refer more details in our paper.

Install the packages

python version >=3.6

transformers==4.7.0
datasets==1.8.0
torch==1.8.0

Also need to install dataclasses, scipy, sklearn, nltk

Preprocess the data

-- download Wiktionary

cd preprocess_wiktionary
bash download_wiktionary.sh

-- download GLUE benchmark

cd preprocess_datasets
bash load_preprocess.sh

Download the checkpoint

-- Huggingface Hub [link]

git lfs install
git clone https://huggingface.co/wyu1/DictBERT

Run experiments on GLUE

-- without dictionary

cd finetune_wo_wiktionary
bash finetune.sh

-- with dictionary

cd finetune_wi_wiktionary
bash finetune.sh

Citation

@inproceedings{yu2022dict,
  title={Dict-BERT: Enhancing Language Model Pre-training with Dictionary},
  author={Yu, Wenhao and Zhu, Chenguang and Fang, Yuwei and Yu, Donghan and Wang, Shuohang and Xu, Yichong and Zeng, Michael and Jiang, Meng},
  booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
  pages={1907--1918},
  year={2022}
}

Please kindly cite our paper if you find this paper and the codes helpful.

Acknowledgements

Many thanks to the Github repository of Transformers. Part of our codes are modified based on their codes.

About

Author: Wenhao Yu ([email protected]). ACL 2022 Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published