Implementation of Single Headed Attention - Recurrent Neural Networks in Julia and Knet.
Stephan Merity. Single Headed Attention RNN: Stop Thinking With Your Head. arXiv preprint arXiv:1911.11423, 2019.
https://arxiv.org/abs/1911.11423v2
After downloading the data and preprocessing it using
sh getdata.sh
You can train the main model of SHA-RNN paper by either:
running sharnn-main.jl in shell
cd examples
julia sharnn-main.jl
or using SHA-RNN notebook.
This implementation is identical to the one of Smerity's original implementation sha-rnn.
But it is slower, since it does not use the same performance tricks that the version of SHA-RNN that was implemented using pytorch uses.
- Fused layer normalization (check if Apex CUDA code can be used with Knet)
- Using half precision floating point (Float16) for memory efficiency
- Checkpoint feature similar to pytorch's checkpoint.