Stack allocation of tuples/immutables containing heap references #18084

wbhart · 2016-08-17T19:28:51Z

If I have a function prototype something like this:

doit(a::Array{Tuple{Int, MyType}, 1}, b::Int, c::MyType)

where MyType is an ordinary Julia GC/heap allocated type, then the following line of code within the function seems to lead to a heap allocation of a tuple (the same is the case for any immutable type).

a[1] = b, c

There seem to be two issues here:

There seems to be no efficient way to write an immutable value directly into an array in Julia. The above shouldn't require any allocation of any kind, stack or heap.
Apparently, Julia GC/heap-allocates immutables if they contain heap references. I don't understand why, since surely the GC scans the stack conservatively for pointers.

Anyway, I have been unable to find a ticket or PR which deals with the above serious performance issue (which we hit in practice), directly.

(Obliquely) relevant tickets/PRs/tickets seem to include [1]. [2], [3] and [4] and of course the tupocalypse itself. There are also some relevant sounding things on Jeff's compiler optimisation tracker [5], though I am not clear if any are directly relevant.

[1] #8134
[2] #12205
[3] https://groups.google.com/forum/#!topic/julia-users/F_ncyfP2vxg
[4] https://groups.google.com/forum/#!topic/julia-users/LthfABeDN50
[5] #3440

The text was updated successfully, but these errors were encountered:

vtjnash · 2016-08-17T19:36:00Z

I don't understand why, since surely the GC scans the stack conservatively for pointers

Because it's a precise scanner.

Closing because this is a feature request, and a duplicate of #12205

wbhart · 2016-08-17T19:52:06Z

Having precise GC doesn't preclude scanning the stack and registers conservatively, e.g. the Mono GC (though they call it "mostly precise").

I can understand closing the ticket on account of the fact that Julia just doesn't have this, and presumably #12205 will make it feasible to deal with point 2.

However, I still have two questions:

How is point 1 above dealt with? Is it just an automatic outcome of fixing 2?
Where does one put feature requests? Has the policy on putting these on the issue tracker changed? (I would normally think of this as a performance issue, rather than a feature request.)

Or maybe my point 1 wasn't sufficiently well explained. I am not looking for someone to write me some library code to do this. I was simply trying to open a ticket for the performance issue with the existing syntax. That's surely a Julia compiler issue, not a library feature request.

JeffBezanson · 2016-08-17T20:28:10Z

This is a perfectly legitimate issue, just a duplicate of #16190 #10899 #14955 #11714 plus the PRs already cited.

The case of a[1] = b, c isn't necessarily related to conservative stack scanning, since it's storing references into an array that's already on the heap. Arrays only support two memory layouts currently: a simple array of references, or an array of flattened immediate data. We can't yet handle a mixture of both in a single array.

What does the performance impact look like in your application? Is it GC time or run time? Do you have a sense of what the cost of the allocation is?

wbhart · 2016-08-17T20:41:17Z

It's hard to get a precise sense of the performance impact. It arose when implementing a trick in a paper called "heap chaining". It's supposed to give us a 5-10x speedup when implemented. However, it made no difference whatsoever when I implemented it (carefully).

In our application, even the medium sized benchmark changed from a few hundred megabytes of total allocations to tens of gigabytes. I've implemented this trick in C before, so have a very good feel for what should happen.

The best I can offer Julia-wise is that performance jumped by a factor of about 1.5 overall when I implemented it a different way. Unfortunately we are still 50% slower than someone else's generic C implementation and a factor of about 15 from the C implementation of the authors of the paper.

I don't think gc time is the issue. I don't have a complete answer on that. I think we went from about 15% GC time to 0.1% when I implemented it a different way. That would suggest GC time wasn't the major component. However, I should add that subsequent computations and shutting Julia down take ages with the extra allocations in there.

vtjnash closed this as completed Aug 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stack allocation of tuples/immutables containing heap references #18084

Stack allocation of tuples/immutables containing heap references #18084

wbhart commented Aug 17, 2016

vtjnash commented Aug 17, 2016

wbhart commented Aug 17, 2016 •

edited

Loading

JeffBezanson commented Aug 17, 2016 •

edited

Loading

wbhart commented Aug 17, 2016 •

edited

Loading

Stack allocation of tuples/immutables containing heap references #18084

Stack allocation of tuples/immutables containing heap references #18084

Comments

wbhart commented Aug 17, 2016

vtjnash commented Aug 17, 2016

wbhart commented Aug 17, 2016 • edited Loading

JeffBezanson commented Aug 17, 2016 • edited Loading

wbhart commented Aug 17, 2016 • edited Loading

wbhart commented Aug 17, 2016 •

edited

Loading

JeffBezanson commented Aug 17, 2016 •

edited

Loading

wbhart commented Aug 17, 2016 •

edited

Loading