Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slow down when @. assignment expression has more than 3 terms #797

Closed
daviehh opened this issue Jun 4, 2020 · 4 comments · Fixed by #1079
Closed

slow down when @. assignment expression has more than 3 terms #797

daviehh opened this issue Jun 4, 2020 · 4 comments · Fixed by #1079
Labels
broadcast performance runtime performance

Comments

@daviehh
Copy link

daviehh commented Jun 4, 2020

Not sure if it's a julia issue or StaticArrays issue: Assignment expressions such as @. r = a + 5 * b + 3 * c - d slows down dramatically when the right-hand-side has 4 terms [allocate and may be dynamically dispatched according to profile]:

n = 5

a = rand(n, n)
b = rand(n, n)
c = rand(n, n)
d = rand(n, n)
r = rand(n, n)

as = MMatrix{n, n, Float64}(a)
bs = MMatrix{n, n, Float64}(b)
cs = MMatrix{n, n, Float64}(c)
ds = MMatrix{n, n, Float64}(d)
rs = MMatrix{n, n, Float64}(r)

function g!(r,a,b,c,d)
    @. r = a + 5 * b + 3 * c - d
end

with standard array:

@btime g!($r, $a, $b, $c, $d)

62.803 ns (0 allocations: 0 bytes)

with static array

@btime g!($rs, $as, $bs, $cs, $ds)

17.648 μs (750 allocations: 13.28 KiB)

however, if tested with the function g_ex! which takes one term out of the assignment,

function g_ex!(r,a,b,c,d)
    @. r = a 
    @. r += 5 * b + 3 * c - d
end

benchmark:

@btime g_ex!($rs, $as, $bs, $cs, $ds)

26.951 ns (0 allocations: 0 bytes)

so this is a ~650 slowdown (17.648 μs vs 26.951 ns). This is very noticeable when functions containing such expressions are in a deeply-nested/hot loop.

Also, it's ok when the assignment only has 3 terms in the function g3!()

function g3!(r,a,b,c)
    @. r = a + 3 * b - c
end

benchmark:

@btime g3!($rs, $as, $bs, $cs)

17.306 ns (0 allocations: 0 bytes)

@daviehh
Copy link
Author

daviehh commented Jun 4, 2020

Just noticed it's not exactly the number of terms: changing the assignment order

function g_reordered!(r,a,b,c,d)
    @. r = 5 * b + 3 * c - d + a
end

then there's no allocation and the speed is fast:

@btime g_reordered!($rs, $as, $bs, $cs, $ds)

22.522 ns (0 allocations: 0 bytes)

Also, changing the first a to 1 * a such that @. r = 1 * a + 5 * b + 3 * c - d also works...

Profiling with Juno suggests this line is dynamically dispatched:

@inbounds $(Expr(:block, exprs...))

profiling script here. Note you may change the loop iteration number to higher or adjust Profile.init's n or delay to see the correct profile on a different machine

Screen Shot 2020-06-04 at 5 06 33 PM

@c42f
Copy link
Member

c42f commented Jun 5, 2020

Likely this is more or less the same issue as #682.

@c42f c42f added broadcast performance runtime performance labels Jun 5, 2020
@daviehh
Copy link
Author

daviehh commented Jun 5, 2020

Might be slightly different since here @inferred g!(rs, as, bs, cs, ds) is ok...

@mateuszbaran
Copy link
Collaborator

Small update: I've proposed a solution here: JuliaLang/julia#41090 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast performance runtime performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants