GitHub - wyu97/DictBERT: Author: Wenhao Yu (wyu1@nd.edu). ACL 2022 Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Introduction

-- This is the pytorch implementation of our ACL 2022 paper "Dict-BERT: Enhancing Language Model Pre-training with Dictionary" [PDF]. In this paper, we propose DictBERT, which is a novel pre-trained language model by leveraging rare word definitions in English dictionaries (e.g., Wiktionary). DictBERT is based on the BERT architecture, trained under the same setting as BERT. Please refer more details in our paper.

Install the packages

python version >=3.6

transformers==4.7.0
datasets==1.8.0
torch==1.8.0

Also need to install dataclasses, scipy, sklearn, nltk

Preprocess the data

-- download Wiktionary

cd preprocess_wiktionary
bash download_wiktionary.sh

-- download GLUE benchmark

cd preprocess_datasets
bash load_preprocess.sh

Download the checkpoint

-- Huggingface Hub [link]

git lfs install
git clone https://huggingface.co/wyu1/DictBERT

Run experiments on GLUE

-- without dictionary

cd finetune_wo_wiktionary
bash finetune.sh

-- with dictionary

cd finetune_wi_wiktionary
bash finetune.sh

Citation

@inproceedings{yu2022dict,
  title={Dict-BERT: Enhancing Language Model Pre-training with Dictionary},
  author={Yu, Wenhao and Zhu, Chenguang and Fang, Yuwei and Yu, Donghan and Wang, Shuohang and Xu, Yichong and Zeng, Michael and Jiang, Meng},
  booktitle={Findings of the Association for Computational Linguistics: ACL 2022},
  pages={1907--1918},
  year={2022}
}

Please kindly cite our paper if you find this paper and the codes helpful.

Acknowledgements

Many thanks to the Github repository of Transformers. Part of our codes are modified based on their codes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Introduction

Install the packages

Preprocess the data

Download the checkpoint

Run experiments on GLUE

Citation

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
finetune_wi_wiktionary		finetune_wi_wiktionary
finetune_wo_wiktionary		finetune_wo_wiktionary
preprocess_datasets		preprocess_datasets
preprocess_wiktionary		preprocess_wiktionary
README.md		README.md

wyu97/DictBERT

Folders and files

Latest commit

History

Repository files navigation

Dict-BERT: Enhancing Language Model Pre-training with Dictionary

Introduction

Install the packages

Preprocess the data

Download the checkpoint

Run experiments on GLUE

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages