A simple Pure Python from-scratch zero-dependency implementation of Original Attention mechanism from Vaswani et al..
The Attention mechanism was proposed in the paper "Attention is All You Need" by Vaswani et al. It has become a key component of many state-of-the-art language models.
This is a minimal implementation meant for understanding the core concepts behind the Attention mechanism.
The code is heavily commented to explain each section and matches the formulas from the paper.
Let me know if any part needs more explanation or can be improved! Feedback is welcomed.