Welcome here! This is my little projec where i implemented transformer architecture based on the original one. I am open to any suggestions and will be happy to get feedback!
- [Attention Is All You Need] https://arxiv.org/pdf/1706.03762.pdf
- [Layer Normalization] https://arxiv.org/pdf/1607.06450.pdf