Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
1 - SeqToSeq Model.ipynb		1 - SeqToSeq Model.ipynb
2 - SeqToSeq Model with Badhanau Attention.ipynb		2 - SeqToSeq Model with Badhanau Attention.ipynb
3 - SeqToSeq Model with Luong Attention.ipynb		3 - SeqToSeq Model with Luong Attention.ipynb
4 - SeqToSeq Model with Convolution.ipynb		4 - SeqToSeq Model with Convolution.ipynb
README.md		README.md
beam_utils.py		beam_utils.py

Repository files navigation

Neural Machine Translation

Machine translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate a text from one natural language to another. Solving this problem with artificial neural networks is often called Neural Machine translation (NMT).

In this project, I trained several sequence to sequence (seq2seq) models for Deutsch to English translation using PyTorch, TorchText and Spacy.

Modeling

Encoder-Decoder architecture

The encoder-decoder architecture is a neural network design pattern. As shown in the figure below, the architecture is partitioned into two parts, the encoder and the decoder. The encoder's role is to encode the inputs into state, which often contains several tensors. Then the state is passed into the decoder to generate the outputs. In machine translation, the encoder transforms a source sentence, into state, a vector, that captures its semantic information. The decoder then uses this state to generate the translated target sentence.

Sequence-to-Sequence model

The sequence-to-sequence model is based on the encoder-decoder architecture to generate a sequence output for a sequence input, as demonstrated below. Both the encoder and the decoder commonly use recurrent neural networks (RNNs) to handle sequence inputs of variable length. The hidden state of the encoder is used directly to initialize the decoder hidden state to pass information from the encoder to the decoder.

In this project, I tried several sequence-to-sequence models with LSTMs, Attention mechanisms, CNNs and Transformers.

Training results (Train - Validation)

SeqToSeq Models	Number of parameters	Loss	Perplexity	Top-5 accuracy (%)	Time per epoch
1. BiGRU	8,501,115	2.051 - 2.561	7.779 - 12.952	12.365 - 11.633
2. BiGRU + Badhanau Attn	9,091,711	2.258 - 2.356	9.567 - 10.554	11.998 - 11.911	00min:33s - 00min:00s
3. BiGRU + Luong Attn	11,649,659	1.795 - 2.208	6.019- 9.099	13.200 - 12.372	00min:36s - 00min-00s
4. Convolution	7,965,273	1.462 - 1.619	4.316 - 5.048	9.227 - 14.742
5. Transformer

Evaluation results (Validation - Test)

BLEU score

SeqToSeq Models	`beam_size=1`	`beam_size=3`	`beam_size=5`
1. BiGRU	20.742 - 21.143	21.212 - 22.840	22.081 - 22.797
2. BiGRU + Badhanau Attn	24.894 - 24.983	25.701 - 26.597	25.770 - 26.105
3. BiGRU + Luong Attn	27.215 - 28.706	29.321 - 29.918	29.525 - 30.395
4. Convolution
5. Transformer

Inference time

SeqToSeq Models	`beam_size=1`	`beam_size=3`	`beam_size=5`
1. BiGRU	00min:12s - 00min:12s	01min:12s - 01min:10s	02min:20s - 02min:27s
2. BiGRU + Badhanau Attn	00min:17s- 00min:17s	01min:41s - 01min:38s	03min:12s - 03min:06s
3. BiGRU + Luong Attn	00min:18s - 00min:18s	01min:47s - 01min:44s	03min:21s - 03min:17s
4. Convolution
5. Transformer

References

[1] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in neural information processing systems (pp. 3104-3112).
[2] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
[3] Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
[4] Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017). Convolutional sequence to sequence learning. arXiv preprint arXiv:1705.03122.
[5] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Machine Translation

Modeling

Encoder-Decoder architecture

Sequence-to-Sequence model

References

About

Releases

Packages

Languages

License

dksifoua/Neural-Machine-Translation

Folders and files

Latest commit

History

Repository files navigation

Neural Machine Translation

Modeling

Encoder-Decoder architecture

Sequence-to-Sequence model

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages