Skip to content

whr94621/ODC-NMT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ODC-NMT

This repository is for our paper Online Distilling from Checkpoints for Neural Machine Translation. For basic usage of this codebase, please refer to the dev branch NJUNMT-pytorch.

If you want to run experiments of ODC, please add an option in src.bin.train

python -m src.bin.train \
    ... \
    --task "odc"

There are some comments about the configuration of ODC (they are all in the training_configs):

  • teacher_choice: The choice of the teacher model. There are several options
    • best: use the best checkpoint as teacher
    • ma: use the EMA (exponential moving average) as the teacher
    • ave_best_k: 'k' should be an integer. This means we use the average of best-k checkpoints.
  • moving_average_method: The choice of moving average method.
    • ema (recommended)
    • sma
    • two_phase_ema
    • none: do not use moving average
  • teacher patience: Integer. This value controls the tolerance that we start to use ODC when current model is inferior to the best checkpoint. I use 1 in the paper.
  • teacher_refresh_warmup: Integer. After how many epoches we start to use ODC. I use 1 in the paper.
  • kd_factor: The value of factor before the knowledge distillation loss.
  • combine_hint_loss_type: The way to combine NLL loss and KD loss. Let kd_factor be $alpha$, it can be
    • "add": nll_loss + $alpha$ * kd_loss
    • "interpolation": (1.0 - $alpha$) * nll_loss + $alpha$ * kd_loss
  • hint_loss_type: The type of knowledge distillation loss. It can be
    • kl (recommended): word-level knowledge distillation
    • mse: MSE of decoder hidden states between teacher and student.

If you have any questions, you can contact me by email [email protected]. If my work can help you somehow, please cite

@inproceedings{wei-etal-2019-online,
    title = "Online Distilling from Checkpoints for Neural Machine Translation",
    author = "Wei, Hao-Ran  and
      Huang, Shujian  and
      Wang, Ran  and
      Dai, Xin-yu  and
      Chen, Jiajun",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1192",
    doi = "10.18653/v1/N19-1192",
    pages = "1932--1941",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published