Welcome here! This is my little project where I implemented transformer architecture based on the original one. Mostly all my work bases on the "Attention is All You Need" paper. I am open to suggestions and will be happy to get any feedback!
- [Attention Is All You Need] https://arxiv.org/pdf/1706.03762.pdf
- [Layer Normalization] https://arxiv.org/pdf/1607.06450.pdf