-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use LoopVectorization to vectorize activation functions and softmax #199
Conversation
|
@CarloLucibello The next release of LoopVectorization will has that. |
Tests are passed now. Any idea? |
Is all of this Zygote friendly? |
@AStupidBear bump on this, it would be really nice to have this performance improvement |
FWIW, I added a much faster AVX512- These will still probably be slower than: function tanh_fast(x)
exp2x = exp(x + x)
(exp2x - 1)/(exp2x + 1)
end But they are a little more accurate. |
Since all those modifications are in kernel functions used in the forward and backward pass of Zygote, they are Zygote friendly. |
what's the status of this? |
It's ready to get merged. julia> using NNlib, StaticArrays; x = SMatrix{2, 2}(rand(2, 2))
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
0.604445 0.330955
0.975996 0.909042
julia> softmax(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
0.408166 0.359373
0.591834 0.640627
julia> σ.(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
0.646673 0.581992
0.726313 0.712804
julia> logsoftmax(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
-0.89608 -1.02339
-0.52453 -0.445308 |
Alright, let's merge this and tag a new release. If no problems come up, we can then extend apply vmap to all activations by providing custom adjoints, right @AStupidBear ? |
@CarloLucibello Yes! But where should we put those definitions? Zygote? |
Maybe it's better to dispatch other activation functions to |
I think we can simply copy the adjoint map, i.e. doing something similar to FluxML/Zygote.jl#728 |
yeah, we can just add |
yes. also needs some tests |
Old NNlib
This PR:
Other activation functions can be sped up by overloading
Base.broadcasted
after the adjoint is defined inLoopVectorization
(JuliaSIMD/LoopVectorization.jl#108).