Do exact array resize!() in more cases #22038

alyst · 2017-05-23T14:49:30Z

Sometimes it's useful to have precise control over memory size allocated for a given array (see e.g. #21938).
One would assume it could be done by resize!(), but under the hood it calls jl_array_grow_at_end().
The latter method is actually tuned for the gradually growing arrays, it would allocate the exactly requested size only if the array is empty,
otherwise it would allocate the next power of 2 that is greater or equal to the requested size.

There are several options:

do nothing
change the heuristic to allocate exactly requested amounts in more cases (final decision implemented by this PR)
add keyword/positional exact argument to resize!()/sizehint!()
change the behaviour of resize!()/sizehint!() to always grow to the exactly specified size
add new resize_exact!() method (either exported or not)
...

KristofferC · 2017-05-23T14:59:50Z

Any thoughts about sizehint!? Does / should it follow the same rule?

alyst · 2017-05-23T16:12:07Z

IMO the same changes should be done to sizehint!(). It would be nice if they remain easily interchangeable.

KristofferC · 2017-05-23T16:20:57Z

This makes sense to me. For example, when you create a sparse matrix it is common that you know exactly how many stored elements you will have so you might do a sizehint!(I, large_number). Potentially wasting a factor of 2 memory seems unnecessary.

Edit: On the other hand you usually start with empty vectors then, which is already covered by the current code.

StefanKarpinski · 2017-05-23T16:22:39Z

Since the cost of the size not being exact is ~one extra resize versus up to 2x memory waste, taking size hints and resizes literally seems like a good idea.

andreasnoack · 2017-05-23T16:55:09Z

Also, in the motivating use case, #21938, we know that the arrays will never grow again.

alyst · 2017-05-24T08:34:12Z

It seems #22044 has fixed the issue with the string interpolation that got exposed by 6d92926 (see #21938 comments): after rebasing to the current master the QR tests are fine again.

KristofferC · 2017-05-25T07:59:02Z

Since sizehint! ends up calling the function modified here, this would automatically work for sizehint! as well, right?

julia/src/array.c

Line 925 in 298fffb

JL_DLLEXPORT void jl_array_sizehint(jl_array_t *a, size_t sz)

alyst · 2017-05-25T22:04:02Z

First take on making resize!() and sizehint!() always growing to an exactly specified size.

alyst · 2017-05-26T05:05:58Z

@nanosoldier runbenchmarks(ALL, vs=":master")

alyst · 2017-05-26T05:07:53Z

Looks like @nanosoldier doesn't listen to me.

KristofferC · 2017-05-26T07:23:53Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2017-05-26T10:23:14Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

KristofferC · 2017-05-26T10:30:33Z

On the positive side, string, join does take less memory now :D

yuyichao · 2017-05-26T15:42:11Z

The necessity last commit basically means that user code that reserves multiple elements with resize! and then assign to them in batches will be much slower than just pushing individual elements so the behavior of resize! should not be changed when the increment is small (the original change in the heuristics is good though).

alyst · 2017-05-26T15:54:26Z

Definitely there should be a way for the user to grow an array the old way.
It could be either the new exported grow!(a, n) method.
Or, an additional exact param for resize!(). In the latter case -- should it default to exact=true or exact=false (old behaviour)?
IMHO, calling resize!() within appending loop sounds a little bit confusing, I would rather use another verb.

yuyichao · 2017-05-26T16:18:18Z

resize! should definitely stay with it's current default. I see no point of making everyone's code suddently slower unless we're moving to a new name altogether.

alyst · 2017-05-26T16:32:05Z

I'm not sure resize!() is used that much in the packages in the append context. For arrays there's push!() and append!(), and low-level streams are already implemented in Base.
OTOH in the Base we don't see a lot of improvements when exact=true is the default.

alyst · 2017-05-26T16:40:50Z

Could somebody ask nanosoldier? Thanks!

KristofferC · 2017-05-26T16:44:29Z

@nanosoldier runbenchmarks(ALL, vs=":master")

yuyichao

I'm not sure resize!() is used that much in the packages in the append context.

The point is that this change gives a O(1) memory saving and a O(n) performance regression. It's basically always wrong when a small increment is used.

nanosoldier · 2017-05-26T19:39:32Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

alyst · 2017-05-26T20:24:01Z

Now the benchmarks looks ok, let's try to decide on the API changes :)

I don't like resize!(A, n, exact::Bool) much: "non-exact resize" sounds confusing.
What about resize!(A, n, preemptive::Bool=false), where preemptive=true means the current behaviour? At the moment resize!() documentation doesn't mention the fact that it allocates more than requested and that this feature could be exploited for growing an array. So making "exact resize" a default behaviour seems a fair change. The same for sizehint!()

yuyichao · 2017-05-26T21:14:13Z

At the moment resize!() documentation doesn't mention the fact that it allocates more than requested and that this feature could be exploited for growing an array. So making "exact resize" a default behaviour seems a fair change. The same for sizehint!()

No, it's not documented in the same way as push! isn't documented to be fast. People expect them to do the right thing and it's a bad argument to make them O(n) slower. They should still be documented but the lack of that is irrelevant for this change.

alyst · 2017-05-26T21:53:06Z

People expect them to do the right thing and it's a bad argument to make them O(n) slower.

What is "the right thing" for resize!()? At least it's context-dependent. For LAPACK preemptive behaviour could mean gigabytes of unnecessary allocated memory with all the consequences of hitting physical memory limits. IMHO if resize!() is called in a loop from the user/package code, it looks like a wrong code pattern, and the author should assert that he really means it by explicitly specifying preemptive=true.

Well, I'm out of anti-preemptive arguments for the moment :) With the updated jl_array_grow_at_end() heuristic I wouldn't be so much obsessed with specifying preemptive=false for each resize!() call in my code.
Just want to be sure that everyone interested agrees with preemptive=true (including the parameter name) before I'll update the PR.

nalimilan · 2017-05-26T21:53:19Z

Personally I would expect "the right thing" for resize! to be "resize this array to the exact size I specified", not "leave a bit more room in case I want to resize it again soon". The right abstraction level seems to be push! and append!, which can easily pass exact=false. Do we have evidence that people use resize! when growing an array progressively, rather than calling push! or append!?

yuyichao · 2017-05-27T01:07:07Z

Do we have evidence that people use resize! when growing an array progressively, rather than calling push! or append!?

Short of any other function to do it currently I believe resize! is the correct function to do it right now. There are certainly a lot of cases in base that does something similar (not just Array, but also arraylist_t in C) so I believe it's a valid use case that shouldn't be broken and doesn't worth requireing users to change their code especially because there's no way we can issue a warning for this and it'll be really hard to catch (and this is also why I think it'll be fine to change the default if we rename it altogether)

I'll still repeat that exact resize will only save O(1) memory (less than a factor of 2, maybe ~1.5x on average assuming some random distribution). The default heuristic can certainly be changed to do exact resizing after a threshold. In fact, doing at least 2x or exact resizing as default should still provide the same scaling while providing a net saving of memory.

KristofferC · 2017-05-27T06:50:54Z

Isn't the thing that O(1) can still be very large?

StefanKarpinski · 2017-05-30T21:21:13Z

My reasoning for 25% is that there are papers which report that 50% growth actually gives optimal performance for exponential growth, and 0% is O(n^2) case that's problematic, so split the difference and make the cutoff halfway between 0% and 50%. The precise cutoff probably doesn't matter.

StefanKarpinski · 2017-05-30T21:22:59Z

Should there be a shrinkwrap exception? I.e. when the requested size is ≤ the current size, we make it exactly that size? I guess that could lead to bad behavior if a vector is being used as a stack or a dequeue.

yuyichao · 2017-05-30T22:08:10Z

The growing factor is #8269 and #16305 and I would rather not tie it up with this one. What I was suggesting as a default heuristic is basically

size_t new_size = old_size * grow_factor;
new_size = new_size > request_size ? new_size : request_size;

And we can change the grow factor if there's clear evident that it's better.

rfourquet · 2017-05-30T22:20:01Z

It's an expected pattern to call resize! repeatedly by (small) increments (e.g. in a loop), so doing always "exact resize!" would be terrible. resize! is about the actual size of the array, I agree with @yuyichao that the allocation mechanism should do "the right thing" (I would believe most of the time RAM memory is big enough to accomodate the exponential growth factor). If one wants to opimize for space, sizehint! seems the more appropriate level to control those implementation details. At the same level, there could be a shrink_to_fit! (name borrowed from c++) to free the non-used space when the vector is not expected to grow anymore (which seems preferable to me than the "shrinkwrap" proposed above). If exact resize is needed, I vote for a new name instead of a keyword, e.g. setsize! which sounds more final than resize!.
Exact resize! beyond a threshold sounds good though, but I would feel more comfortable with at least 50% to get the "optimal exponential growth" (i.e. the same growth factor as push! etc.). If I remember correctly, c++ libs do this with a growth factor equal to 2: newsize = max(requestedsize, oldsize*2).

rfourquet · 2017-05-30T22:27:57Z

Also #16305 (comment) may be relevant?

alyst · 2017-05-30T22:43:26Z

I've reverted the API changes and implemented 125%/200% growth thresholds heuristic discussed above. I think it should be sensitive enough to do exact resize in most of relevant cases. Let's see what nanosoldier thinks of that.

Shrinkwrap might be useful for some applications, although it's much more easy to hit memory limits with imprecise growth than with imprecise shrinkage. And definitely it would break much more code patterns than exact growth.

alyst · 2017-05-30T23:19:26Z

CI tests look ok, could somebody please ping nanosoldier as it doesn't listen to me?

andreasnoack · 2017-05-30T23:20:24Z

@nanosoldier runbenchmarks(ALL, vs=":master")

tkelman · 2017-05-31T02:02:29Z

it would make the most sense to me if the cutoff here was such that this PR didn't change the minimum amount of growth, but only the amount of space used when that minimum is exceeded

nanosoldier · 2017-05-31T02:17:35Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

StefanKarpinski · 2017-05-31T03:13:26Z

Why? Why does that make more sense? The point of my proposal was that if you're in the bad O(n^2) then each growth is small – of fixed size, typically. Having any 1+ε multiplicative threshold should suffice to avoid that. It doesn't make sense to me why 2x growth makes sense as the cutoff and no one has made a compelling argument for it. Feeble as my "25% is halfway between 0% and 50%" argument is, at least it is an argument.

tkelman · 2017-05-31T03:21:59Z

2x is our array growth cutoff elsewhere, and it doesn't make sense for resize to arbitrarily use a different growth ratio.

This also resizes to a discontinuous size - ask for 24% larger and the space doubles, but ask for 25% larger and it grows by less? That doesn't seem like a useful or predictable behavior.

yuyichao · 2017-05-31T03:23:55Z

25% is halfway between 0% and 50%

So it's in between a "optimum" and the worst? That actually feels kind of random, mostly the 0% part. And

This also resizes to a discontinuous size - ask for 24% larger and the space doubles, but ask for 25% larger and it grows by less? That doesn't seem like a useful or predictable behavior.

is actually also why I think the cutoff should not be lower than the grow factor.

StefanKarpinski · 2017-05-31T03:33:35Z

Ok. The continuity argument is a good one – I buy that. Thank you for actually making a case. Let's go with that then, and the appropriate growth factor can be tweaked later (or not at all).

alyst · 2017-05-31T08:07:04Z

Now it should be "continuous". I wonder if it helps sparse matrix tests. Nanosoldier, please?

nalimilan · 2017-05-31T08:18:40Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2017-05-31T11:14:41Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

alyst · 2017-05-31T11:30:46Z

I see no relevant regressions in the last nanosoldier run (no real improvements either).
Do you think we have reached the convergence? :)
Should I mention the change of resize!() behaviour in NEWS.md?

StefanKarpinski · 2017-05-31T12:56:42Z

A NEWS entry would be great. Then I think we're good to go here! Nice work :)

If the requested size of array buffer is >=200% of its current size, grow exactly to the requested size, otherwise double the current size. This should make resize!() & sizehint!() allocate the exactly requested size in more situations (previously the exact resize was guaranteed for empty arrays only). This heuristic should reduce memory usage when arrays are growing non-incrementally (e.g. temporary buffers allocated by LAPACK wrappers).

alyst · 2017-05-31T14:37:01Z

Squashed and added NEWS entry.
Thanks everyone for the lively discussion! :)

alyst · 2017-06-01T14:55:52Z

Anything else to do before merging?

kshyatt added the arrays [a, r, r, a, y, s] label May 23, 2017

alyst force-pushed the exact_resize branch from f750bb0 to 6d92926 Compare May 24, 2017 07:15

alyst force-pushed the exact_resize branch from 6d92926 to b9886c5 Compare May 25, 2017 22:02

yuyichao requested changes May 26, 2017

View reviewed changes

alyst force-pushed the exact_resize branch from 639160c to fbba6e0 Compare May 30, 2017 22:16

yuyichao approved these changes May 31, 2017

View reviewed changes

alyst changed the title ~~[WIP/RFC] Exact array resize!()~~ Do exact array resize!() in more cases May 31, 2017

alyst force-pushed the exact_resize branch from fecd1ad to d6ac437 Compare May 31, 2017 14:31

alyst added 2 commits May 31, 2017 16:33

add _grow/deletebeg/end!(a::Array) ccall wrappers

a2cdad8

alyst force-pushed the exact_resize branch from d6ac437 to 66c9bbb Compare May 31, 2017 14:34

KristofferC merged commit f74e55e into JuliaLang:master Jun 1, 2017

alyst mentioned this pull request Jun 1, 2017

LAPACK wrappers: use resize!(a) instead of a = Vector{T}() #21938

Merged

Do exact array resize!() in more cases #22038

Do exact array resize!() in more cases #22038

Conversation

alyst commented May 23, 2017 • edited Loading

KristofferC commented May 23, 2017

alyst commented May 23, 2017

KristofferC commented May 23, 2017 • edited Loading

StefanKarpinski commented May 23, 2017

andreasnoack commented May 23, 2017

alyst commented May 24, 2017

KristofferC commented May 25, 2017

alyst commented May 25, 2017

alyst commented May 26, 2017

alyst commented May 26, 2017

KristofferC commented May 26, 2017

nanosoldier commented May 26, 2017

KristofferC commented May 26, 2017

yuyichao commented May 26, 2017

alyst commented May 26, 2017

yuyichao commented May 26, 2017

alyst commented May 26, 2017

alyst commented May 26, 2017

KristofferC commented May 26, 2017

yuyichao left a comment

Choose a reason for hiding this comment

nanosoldier commented May 26, 2017

alyst commented May 26, 2017

yuyichao commented May 26, 2017

alyst commented May 26, 2017

nalimilan commented May 26, 2017

yuyichao commented May 27, 2017

KristofferC commented May 27, 2017

StefanKarpinski commented May 30, 2017

StefanKarpinski commented May 30, 2017

yuyichao commented May 30, 2017

rfourquet commented May 30, 2017

rfourquet commented May 30, 2017

alyst commented May 30, 2017

alyst commented May 30, 2017

andreasnoack commented May 30, 2017

tkelman commented May 31, 2017

nanosoldier commented May 31, 2017

StefanKarpinski commented May 31, 2017 • edited Loading

tkelman commented May 31, 2017

yuyichao commented May 31, 2017

StefanKarpinski commented May 31, 2017 • edited Loading

alyst commented May 31, 2017

nalimilan commented May 31, 2017

nanosoldier commented May 31, 2017

alyst commented May 31, 2017

StefanKarpinski commented May 31, 2017

alyst commented May 31, 2017

alyst commented Jun 1, 2017

alyst commented May 23, 2017 •

edited

Loading

KristofferC commented May 23, 2017 •

edited

Loading

StefanKarpinski commented May 31, 2017 •

edited

Loading

StefanKarpinski commented May 31, 2017 •

edited

Loading