Skip to content

Implementation of Single Headed Attention - Recurrent Neural Networks in Julia and Knet

License

Notifications You must be signed in to change notification settings

alisafaya/SHA-RNN.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SHA-RNN

Implementation of Single Headed Attention - Recurrent Neural Networks in Julia and Knet.

Stephan Merity. Single Headed Attention RNN: Stop Thinking With Your Head. arXiv preprint arXiv:1911.11423, 2019.

https://arxiv.org/abs/1911.11423v2

SHA-RNN Model

After downloading the data and preprocessing it using

sh getdata.sh

You can train the main model of SHA-RNN paper by either:

running sharnn-main.jl in shell

cd examples
julia sharnn-main.jl

or using SHA-RNN notebook.

This implementation is identical to the one of Smerity's original implementation sha-rnn.

But it is slower, since it does not use the same performance tricks that the version of SHA-RNN that was implemented using pytorch uses.

Features to be added to get faster training :

  • Fused layer normalization (check if Apex CUDA code can be used with Knet)
  • Using half precision floating point (Float16) for memory efficiency
  • Checkpoint feature similar to pytorch's checkpoint.

About

Implementation of Single Headed Attention - Recurrent Neural Networks in Julia and Knet

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published