Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make empty Dict only have space for 0 rather than 16 elements (and the minimum size for non-empty Dict down to 4). #52912

Merged

Conversation

oscardssmith
Copy link
Member

@oscardssmith oscardssmith commented Jan 15, 2024

This improves speed and reduces allocation for small Dicts.

#before
julia> @btime Dict{Int,Int}()
  46.815 ns (4 allocations: 448 bytes)

#after
julia> @btime Dict{Int, Int}()
  9.429 ns (1 allocation: 80 bytes)

The initialization to 16 before made sense when Dict was Vector backed since Vectors were slow to instantiate, but Memory is much faster).

@oscardssmith oscardssmith added the performance Must go faster label Jan 15, 2024
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 16, 2024

Starting with zero might be much better, since it doesn't allocate at all?

@oscardssmith
Copy link
Member Author

The 2 arguments against 0 are that it makes the resizing awkward (since otherwise resize always multiplies by 4/2) and that heuristically, we expect most Dicts to hold an element eventually.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 16, 2024

counter-point: resize should already have a special case to make sure it allocates enough initially. And the initial sizehint!(Dict(), n) should become much cheaper, so union! and others become much cheaper

@grandinj
Copy link

just a drive by comment: The Java hotspot engineers did significant work profiling the Java ArrayList class, and the default initial capacity there is 10 - which should not be taken to mean that 10 is best, since that is an implementation-dependent number, but just that a non-zero number is probably best.

@oscardssmith
Copy link
Member Author

that is good to know, but it is worth remembering that Java doesn't have tuples which will affect the distribution. we also have a weird optimization where length 0 Memory is especially fast since it doesn't have to allocate at all.

This improves speed for small Dicts.
(the 16 before made sense when Dict was Vector backed since Vectors were slow to instantiate,
but size Memory are much faster and have an extra fast path for size 0)
@oscardssmith oscardssmith changed the title Make empty Dict only have space for 4 rather than 16 elements. Make empty Dict only have space for 0 rather than 16 elements (and the minimum size for non-empty Dict down to 4). Jan 21, 2024
@oscardssmith
Copy link
Member Author

I've now modified this to initialize to 0 elements as suggested. I think we should probably take a look at the nanosoldier benchmarks to make sure this doesn't cause any regressions, but other than that, I think this is ready to merge.

@oscardssmith oscardssmith added the needs nanosoldier run This PR should have benchmarks run on it label Jan 21, 2024
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 21, 2024

@nanosoldier runbenchmarks(!"scalar" && !"union" && !"array", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@oscardssmith
Copy link
Member Author

that's a lot of noise, but I don't think any of the regressions are real (somewhat hard to say though)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 21, 2024

Maybe we'd see more impact for changing IdDict, since its default is pretty massive?

julia> IdDict().ht
32-element Memory{Any}:

@oscardssmith
Copy link
Member Author

Note that it's also only 16 elements since it stores 1 list of pairs rather than 2 lists.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jan 22, 2024

Starting with zero might be much better, since it doesn't allocate at all?

Why is that relevant to anyone? Are 0-sized used much at all? I mean besides as a starting point. Which you can benchmark, but you would (almost?) always add something to it, and then you're just deferring the "whole operation" in context (what you really should what to be rather benchmarking?), and that combination would not be faster with 0-sized? Possibly slower?

The 2 arguments against 0 are that it makes the resizing awkward

You mean a special-case (making always slower) we want to avoid?

@jakobnissen
Copy link
Contributor

jakobnissen commented Jan 22, 2024

I tested this PR on an application-like workload of mine, which uses dicts faily heavily with around 1/3 of runtime is spent in dict lookups, and at least 25% of runtime on get! to build a dictionary. None of my dicts ought to be empty, but it's likely that a large fraction of them have only one or two keys. The total runtime is unchanged on this PR vs 47d31ac (3.135 vs 3.131 seconds).

@gbaraldi
Copy link
Member

@PallHaraldsson the reason for starting with 0 is that a 0 sized array is free to allocate, and there isn't much benefit in paying the allocation cost up front

@lassepe
Copy link
Sponsor Contributor

lassepe commented Jan 22, 2024

I tested this on pre-1.0 ProtoBuf which relied heavily on Dict as a backing data structure (ref JuliaIO/ProtoBuf.jl#179)

Below are the benchmark results obtained from the MWE of JuliaIO/ProtoBuf.jl#179. TL;DR: A solid drop in allocations, a minor drop in total time.

  1. Nightly:

image

  1. This PR:

image

@oscardssmith
Copy link
Member Author

Given the testing, I think this is ready to merge.

@oscardssmith
Copy link
Member Author

Another reason why this is very important :):

python3 -m timeit "{}"
20000000 loops, best of 5: 12.8 nsec per loop

@oscardssmith oscardssmith removed the needs nanosoldier run This PR should have benchmarks run on it label Jan 23, 2024
@oscardssmith oscardssmith merged commit d741c24 into JuliaLang:master Jan 23, 2024
9 of 10 checks passed
@oscardssmith oscardssmith deleted the os/empty-make-Dict-smaller branch January 23, 2024 14:57
vtjnash referenced this pull request Jan 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants