A Transformer-based Detector-Purificator-Corrector Framework for Spelling Error Correction of Bangla and Resource Scarce Indic Languages
Preprint — https://arxiv.org/abs/2211.03730
Operating System | Requirement | Remark |
---|---|---|
Ubuntu 16.04.7 LTS | requirements_u.yml | ✔️ Successful |
Ubuntu 18.04.6 LTS (Google Colab) | requirements_c.txt | ✔️ Successful* |
Windows 10 | requirements_w.yml | ✔️ Successful |
git clone https://github.com/mehedihasanbijoy/DPCSpell.git
or manually download and extract the github repository of DPCSpell.
conda env create -f requirements_u.yml (for Ubuntu 16.04.7 LTS)
or
conda env create -f requirements_w.yml (for Windows 10)
conda activate DPCSpell
gdown https://drive.google.com/drive/folders/1_sWSi-LFsvuYh9c5GBMDd4V6_uM8yYjH?usp=share_link -O ./Dataset --folder
or manually download the folder from here and keep the extracted files into ./Dataset/
python detector.py --CORPUS "./Dataset/corpus.csv" --HID_DIM 128 --ENC_LAYERS 5 --DEC_LAYERS 5 --ENC_HEADS 8 --DEC_HEADS 8 --ENC_PF_DIM 256 --DEC_PF_DIM 256 --ENC_DROPOUT 0.1 --DEC_DROPOUT 0.1 --CLIP 1 --LEARNING_RATE 0.0005 --N_EPOCHS 100
python purificator.py --HID_DIM 128 --ENC_LAYERS 5 --DEC_LAYERS 5 --ENC_HEADS 8 --DEC_HEADS 8 --ENC_PF_DIM 256 --DEC_PF_DIM 256 --ENC_DROPOUT 0.1 --DEC_DROPOUT 0.1 --CLIP 1 --LEARNING_RATE 0.0005 --N_EPOCHS 100
python corrector.py --HID_DIM 128 --ENC_LAYERS 5 --DEC_LAYERS 5 --ENC_HEADS 8 --DEC_HEADS 8 --ENC_PF_DIM 256 --DEC_PF_DIM 256 --ENC_DROPOUT 0.1 --DEC_DROPOUT 0.1 --CLIP 1 --LEARNING_RATE 0.0005 --N_EPOCHS 100