Disfluency-Generation-and-Detection

This repo contains codes for the following paper:

Jingfeng Yang, Zhaoran Ma, Diyi Yang: Planning and Generating Natural and Diverse Disfluent Texts as Augmentation for Disfluency Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP'2020)

If you would like to refer to it, please cite the paper mentioned above.

Getting Started

These instructions will get you running the codes.

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)

Planner and Generator Disfluency Generation

cd disf_gen_coarse2fine &&
python train.py -learning_rate 0.001 -no_share_emb_layout_encoder -seprate_encoder -batch_size 64 -max_grad_norm 0.1 -layout_weight 1 -optim adam &&
python evaluate.py &&
cd ..

Heuristic Planner + GPT2 Generator for data augmentation

CUDA_VISIBLE_DEVICES=0 python transformers/examples/run_language_modeling.py --output_dir=news3m_ml_finetune_st --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=news_3m --do_eval --eval_data_file=swbd_LM_val --line_by_line --eval_all_checkpoints --num_train_epochs 6 --logging_steps 6000 --save_steps 6000 &&
python createFakeLMdist.py -infile news_to_fake_3m -outfile news_fake_3m_newstune360000_mp -model_path news3m_ml_finetune_st/checkpoint-360000 -gpu 2222333333555555 &&
python writePretrain.py

Disfluency detection w/ or w/o augmented data

Please run ./code/train.py to train the MixText model (use both labeled and unlabeled training data):

python trainBertPretrain.py ||
python trainBertPretrain.py -p

Aknowledgement

Disfluency generation code is adapted from OpenNMT and Coarse2fine Semantic Parsing

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
disf_gen_coarse2fine		disf_gen_coarse2fine
.DS_Store		.DS_Store
README.md		README.md
analysis-seq.py		analysis-seq.py
analysis.py		analysis.py
createFake.py		createFake.py
createFakeLM.py		createFakeLM.py
createFakeLMdist.py		createFakeLMdist.py
preprocessPTBIO.py		preprocessPTBIO.py
requirements.txt		requirements.txt
testSignificanceF.py		testSignificanceF.py
trainBertPretrain.py		trainBertPretrain.py
trainBertPretrainDist.py		trainBertPretrainDist.py
utilsBertPretrain.py		utilsBertPretrain.py
utilsBertPretrainDist.py		utilsBertPretrainDist.py
writePretrain.py		writePretrain.py
writeSig.py		writeSig.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disfluency-Generation-and-Detection

Getting Started

Requirements

Planner and Generator Disfluency Generation

Heuristic Planner + GPT2 Generator for data augmentation

Disfluency detection w/ or w/o augmented data

Aknowledgement

About

Releases

Packages

Languages

JingfengYang/Disfluency-Generation-and-Detection

Folders and files

Latest commit

History

Repository files navigation

Disfluency-Generation-and-Detection

Getting Started

Requirements

Planner and Generator Disfluency Generation

Heuristic Planner + GPT2 Generator for data augmentation

Disfluency detection w/ or w/o augmented data

Aknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages