-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make empty Dict
only have space for 0 rather than 16 elements (and the minimum size for non-empty Dict
down to 4).
#52912
Make empty Dict
only have space for 0 rather than 16 elements (and the minimum size for non-empty Dict
down to 4).
#52912
Conversation
Starting with zero might be much better, since it doesn't allocate at all? |
The 2 arguments against 0 are that it makes the resizing awkward (since otherwise resize always multiplies by 4/2) and that heuristically, we expect most |
counter-point: |
just a drive by comment: The Java hotspot engineers did significant work profiling the Java ArrayList class, and the default initial capacity there is 10 - which should not be taken to mean that 10 is best, since that is an implementation-dependent number, but just that a non-zero number is probably best. |
that is good to know, but it is worth remembering that Java doesn't have tuples which will affect the distribution. we also have a weird optimization where length 0 Memory is especially fast since it doesn't have to allocate at all. |
This improves speed for small Dicts. (the 16 before made sense when Dict was Vector backed since Vectors were slow to instantiate, but size Memory are much faster and have an extra fast path for size 0)
26e7858
to
bdfdbb9
Compare
Dict
only have space for 4 rather than 16 elements.Dict
only have space for 0 rather than 16 elements (and the minimum size for non-empty Dict
down to 4).
I've now modified this to initialize to 0 elements as suggested. I think we should probably take a look at the nanosoldier benchmarks to make sure this doesn't cause any regressions, but other than that, I think this is ready to merge. |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. |
that's a lot of noise, but I don't think any of the regressions are real (somewhat hard to say though) |
Maybe we'd see more impact for changing IdDict, since its default is pretty massive?
|
Note that it's also only 16 elements since it stores 1 list of pairs rather than 2 lists. |
Why is that relevant to anyone? Are 0-sized used much at all? I mean besides as a starting point. Which you can benchmark, but you would (almost?) always add something to it, and then you're just deferring the "whole operation" in context (what you really should what to be rather benchmarking?), and that combination would not be faster with 0-sized? Possibly slower?
You mean a special-case (making always slower) we want to avoid? |
I tested this PR on an application-like workload of mine, which uses dicts faily heavily with around 1/3 of runtime is spent in dict lookups, and at least 25% of runtime on |
@PallHaraldsson the reason for starting with 0 is that a 0 sized array is free to allocate, and there isn't much benefit in paying the allocation cost up front |
I tested this on pre-1.0 ProtoBuf which relied heavily on Below are the benchmark results obtained from the MWE of JuliaIO/ProtoBuf.jl#179. TL;DR: A solid drop in allocations, a minor drop in total time.
|
Given the testing, I think this is ready to merge. |
Another reason why this is very important :):
|
Ref: https://reviews.llvm.org/D33656. --------- Co-authored-by: Keno Fischer <[email protected]>
This improves speed and reduces allocation for small Dicts.
The initialization to 16 before made sense when
Dict
wasVector
backed sinceVectors
were slow to instantiate, but Memory is much faster).