Enhancing Machine Translation with Dependency-Aware Self-Attention

This is the implementation of the approaches described in the paper:

Emanuele Bugliarello and Naoaki Okazaki. Enhancing Machine Translation with Dependency-Aware Self-Attention. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, July 2020.

We provide the code for reproducing our results, as well as translation outputs of each model.

Requirements

You can clone this repository with submodules included issuing: git clone --recurse-submodules [email protected]:e-bug/pascal

The requirements can be installed by setting up a conda environment:
conda env create -f environment.yml followed by source activate pascal

Data Preparation

The pre-processing steps for each model in each data set can be found in the corresponding experiments/ folder, and rely on our code (scripts/) as well as on third-party software (tools/).

Training

Scripts for training each model are provided in the corresponding data set folder in experiments/ (e.g., experiments/wmt16en2de/transformer/train.sh).

Note that we trained our models on a SGE cluster. To run our training experiments, submit (qsub) the corresponding train.sge file for a given experiment. It calls the train.sh file associated in its directory.

Evaluation

Similarly, you can use the corresponding eval.sh and eval.sge files to evaluate a model.

Description of this repository

experiments/
Contains code to reproduce our results. For each data set, the following files are used to prepare the data:
- prepare_data.sh: Google's pre-processing steps for the Transformer model
- prepare_filt_data.sh: use langdetect to remove sentences in languages that do not match source or target ones
- prepare_lin_parses.sh: extract linearized parses for the multi-task approach of Currey and Heatfield (WMT'19).
- prepare_tags_label.sh: extract dependency labels following the approach of Sennrich and Haddow (WMT'16)
- prepare_tags_mean.sh: extract dependency heads and map them to mean/middle position of the parent's sub-word units
- prepare_tags_root.sh: extract dependency heads and map them to first position (root) of the parent's sub-word units
- binarize_*.sh files: convert text data into binary files used by Fairseq
fairseq/
Our code is based on a fork of Fairseq (commit ID can be found in VERSION.md). Here, we introduce a new tags-translation task to accept two source files (words and syntactic tags). This is implemented through the following files:
- data/tags_language_pair_dataset.py
- models/fairseq_tags_encoder.py
- models/fairseq_model.py
- tasks/tags_translation.py
We also implement the following dependency-aware Transformer models:
- Pascal: Parent-Scaled Self-Attention mechanism
  - models/pascal_transformer.py
  - modules/multihead_pascal.py
- LISA: adaptation of Strubell et al. (EMNLP'18) to sub-word units
  - models/lisa_transformer.py
  - modules/multihead_lisa.py
  - criterions/lisa_cross_entropy.py
- TagEmb (S&H): adaptation of Sennrich and Haddow (WMT'16) for the Transformer model
  - models/tagemb_transformer.py
- Multi-Task (C&H): our implementation of Currey and Heatfield (WMT'19)
  - This is a data augmentation technique, so no new models were created
scripts/: data preparation scripts for extracting syntactic tags
tools/: third-party and own software used in pre-processing (e.g., Moses and BPE) as well as evaluation (e.g., RIBES)

License

This work is licensed under the MIT license. See LICENSE for details. Third-party software and data sets are subject to their respective licenses.
If you find our code/models or ideas useful in your research, please consider citing the paper:

@inproceedings{bugliarello-okazaki-2020-enhancing,
    title = "Enhancing Machine Translation with Dependency-Aware Self-Attention",
    author = "Bugliarello, Emanuele  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.147",
    pages = "1618--1627",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Machine Translation with Dependency-Aware Self-Attention

Requirements

Data Preparation

Training

Evaluation

Description of this repository

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
experiments		experiments
fairseq		fairseq
scripts		scripts
tools		tools
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

License

e-bug/pascal

Folders and files

Latest commit

History

Repository files navigation

Enhancing Machine Translation with Dependency-Aware Self-Attention

Requirements

Data Preparation

Training

Evaluation

Description of this repository

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages