dispatch speedups #21760

JeffBezanson · 2017-05-09T19:34:55Z

Test script: https://gist.github.com/JeffBezanson/a5f4abd6f093795f7d8c41501fb94d8b

release-0.6:

  0.026056 seconds (10.00 k allocations: 156.250 KiB)
  0.061410 seconds
  0.001493 seconds (10.00 k allocations: 156.250 KiB)
  0.081548 seconds (489 allocations: 7.641 KiB)
  0.060909 seconds (87 allocations: 6.547 KiB)
  0.060998 seconds (87 allocations: 6.547 KiB)

This PR:

  0.002621 seconds (10.00 k allocations: 156.250 KiB)
  0.005154 seconds
  0.001432 seconds (10.00 k allocations: 156.250 KiB)
  0.002044 seconds (489 allocations: 7.641 KiB)
  0.000898 seconds (87 allocations: 6.547 KiB)
  0.000948 seconds (87 allocations: 6.547 KiB)

The first result in the list is probably just luck from changing the order of the table. Speeding up convert dispatch is not really solved here; I managed to come up with a hack that only works for constructors so far.

The fix for #21370 sacrificed 0-argument constructors (#21730) and functions whose types have parameters. This hopefully fixes that, while keeping the fix for #21370.

@nanosoldier runbenchmarks(ALL, vs=":master")

KristofferC · 2017-05-09T19:38:23Z

Would these benchmarks be suitable to add to Nanosoldier?

JeffBezanson · 2017-05-09T19:43:27Z

Yes, I don't see why not. Let's add them.

nanosoldier · 2017-05-09T22:23:51Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

JeffBezanson · 2017-05-17T03:32:14Z

Ping @vtjnash --- what do you think?

vtjnash · 2017-05-17T07:36:33Z

This feels unlikely to be valid, but I'll have to work a bit harder to understand it to see if I can create a good counterexample.

JeffBezanson · 2017-05-17T14:32:04Z

It actually ended up being really simple. First I restored using either 0 or 1 as the offset based on whether the function type has parameters. Hopefully that's uncontroversial.

Next, I observed that the ->any table is never used for the 0th argument, because you can't have Any in that slot. So I use it to split Type{T}s based on whether T is_cache_leaf. If it is, it gets inserted as before starting with offset 0. Otherwise, the offset is incremented to 1 and it's inserted into the ->any table. Both are checked on lookup (as before). This concept seems sound to me.

vtjnash · 2017-05-17T21:16:19Z

because you can't have Any in that slot

This is true, but actually irrelevant. The question is whether qualifying for the split guarantees it is more specific than anything in the sorted list. The offset = 0 test also seems to be a red-herring – it likely should be valid regardless of offset. If that's true, you can just introduce this universally as a new field. I think we also need to double-check that this will handle Varargs correctly.

JeffBezanson · 2017-05-17T21:36:32Z

Yes, I did try to make this work more generally at any offset, but couldn't get it working since I didn't want to add another field to typemaps. I'm not sure we want to add another field, especially if this is backported to 0.6 which would be nice.

KristofferC · 2017-05-26T22:53:52Z

Bump, would be nice to get this in 0.6 if it is determined to be sound.

JeffBezanson · 2017-07-01T05:09:47Z

@vtjnash bump. Maybe we can put this on master and see how it goes?

vtjnash · 2017-07-01T18:16:25Z

I'm still fairly certain that aaaf9f4 violates specificity ordering. The other two commits look ok.

I'm not sure we want to add another field

Since you're using one field to mean two different things, the question isn't whether we add another field to disambiguate those, it's whether it would be valid to remove the offs == 0 test (even if we end up deciding to keep that condition in place). If the system still works without that condition, then that commit may be OK. Otherwise that commit is likely invalid.

JeffBezanson · 2017-07-01T18:42:38Z

I'll separate the two safer commits and then explore this.

JeffBezanson · 2017-07-11T19:55:00Z

Ok, I came up with a different approach that's simpler and more general. I changed the meaning of the any field: instead of using it to skip slots equal to Any, it's used to skip all non-leaf slots if some later slot is a cache_leaf. This means each level is neatly split into three cases:

Leaf at this offset; use hash splitting.
Leaf at some later offset; use any.
No remaining leaf types, use linear.

Those cases should be correctly ordered, most specific first. If this checks out, the field should perhaps be renamed skip.

vtjnash

That does seem tantalizingly close to being usable. But as you noted in the code comment, it doesn't work because of Vararg. Is it going to be possible to resolve that?

Also, can you split out the other commits into a new PR. As much as I love re-reviewing code, it does get a bit hard to keep track of which parts of the diff are relevant to the new proposed optimization.

vtjnash · 2017-07-12T03:27:48Z

src/typemap.c

+ assert(offs != lastleaf);
+ if (tparams->unsorted) {
+ // if we couldn't split on this offset but can split on a later one, skip this slot.
+ // Only do this for sorted maps, since there are cases where an apparently non-leaf


vtjnash · 2017-07-12T03:27:56Z

src/typemap.c

+ jl_value_t *ttypes = jl_unwrap_unionall((jl_value_t*)types);
+ int offs, l = jl_field_count(ttypes);
+
+ for(offs = l-1; offs >= 0; offs--) {


What's wrong with it?

vtjnash · 2017-07-12T03:37:48Z

src/gf.c

@@ -126,7 +126,7 @@ const struct jl_typemap_info method_defs = {
 0, &jl_method_type
 };
 const struct jl_typemap_info lambda_cache = {
- 0, &jl_method_instance_type
+ 1, &jl_method_instance_type


Sorry, when reviewing earlier I mistook which cache this was and thought it was the tfunc / specializations cache. The lambda (method table) cache requires sorting.

Ok. but it surely doesn't require sorting in the same sense --- this doesn't cause any test failures. We might need another flag telling the typemap that only simple signatures will be inserted (i.e. the stuff we put in method caches, basically leaf types and Anys).

The typemap figures that out on its own already and by-passes sorting when not applicable. I'm not too surprised that our tests don't manage to hit this case – we have a few other sorting bugs that I suspect we would usually hit first, limiting our ability to demonstrate the sorted-ness of this cache.

I believe such a flag would enable the optimization I'm attempting here. It's only safe to split on the second argument of (::AbstractThing, ::Int) if we know (::AbstractThing{N}, ::Vararg{Int,N}) is not going to exist.

JeffBezanson · 2017-07-12T03:41:49Z

Just look at the newest commit (speed up 0-arg constructor dispatch). The others are unchanged, and that one is all new.

I think the solution is to use this trick only for unsorted maps (EDIT: or method caches, which use a highly restricted set of types). Method caches are far, far more performance-critical than sorted method lists, and I highly doubt they will ever be able to handle anything as complex as (a::ConjArray{T,N,A}, i::Vararg{Int64,N}) where {T, N}.

This restores the ability to start splitting typemaps at either argument 0 or 1, depending on whether the function type has parameters.

JeffBezanson · 2017-07-14T18:45:52Z

@vtjnash : I removed the change to the unsorted flag in lambda_cache, and added the flag I described. What do you think? This speedup would be nice...

Uses the `any` cache to skip all non-leaf slots when a later slot is splittable.

vtjnash · 2017-07-14T22:15:23Z

src/typemap.c

 return 0;
- return jl_typemap_visitor(cache.node->any, fptr, closure);
+ return jl_typemap_node_visitor(cache.node->linear, fptr, closure);


You can't change the visitation order unless you know that jl_typemap_info->simplekeys value is set. This method defines the sort order for MethodTable.

vtjnash · 2017-07-14T22:15:32Z

src/typemap.c

 return 0;
- return jl_typemap_intersection_visitor(map.node->any, offs+1, closure);
+ return jl_typemap_intersection_node_visitor(map.node->linear, closure);


vtjnash · 2017-07-14T22:23:26Z

src/julia.h

@@ -448,7 +448,7 @@ typedef struct _jl_typemap_level_t {
 struct jl_ordereddict_t arg1;
 struct jl_ordereddict_t targ;
 jl_typemap_entry_t *linear; // union jl_typemap_t (but no more levels)
- union jl_typemap_t any; // type at offs is Any
+ union jl_typemap_t any; // type at offs is skipped; will be split later


Since you're changing the meaning of this field, can you reorder this list to show the sort order and rename it something else.

I think I would still like to keep the any field also, as a possible means of splitting Method tables for incremental deserialization. Scanning these tables is currently almost all of the cost there. There's fairly few instances of this type (probably a couple thousand tops), and they're already usually expected to be gigantic (several hundred to several thousand bytes).

vtjnash · 2017-07-14T22:28:10Z

src/typemap.c

@@ -1056,7 +1068,7 @@ jl_typemap_entry_t *jl_typemap_insert(union jl_typemap_t *cache, jl_value_t *par
 newrec->isleafsig = newrec->issimplesig = 0;
 }


Might as well add an executable assertion on the meaning of simplekeys here:
assert((!simplekeys || newrec->issimplesig) && "bad insert")

vtjnash · 2017-10-12T20:17:18Z

Bump? I agree, this speedup would be nice :)

StefanKarpinski · 2017-10-12T21:16:31Z

Unless this is blocking something, I think we should wait until post 1.0 to merge this.

JeffBezanson added the performance Must go faster label May 9, 2017

KristofferC added the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label May 9, 2017

JeffBezanson force-pushed the jb/typemap branch from 01faebe to 60ad492 Compare May 9, 2017 21:00

JeffBezanson requested a review from vtjnash May 10, 2017 19:02

JeffBezanson force-pushed the jb/typemap branch from 60ad492 to 68a1b49 Compare May 17, 2017 04:14

JeffBezanson force-pushed the jb/typemap branch from 68a1b49 to 7ee7b87 Compare June 1, 2017 18:39

JeffBezanson force-pushed the jb/typemap branch from 7ee7b87 to 5002e0e Compare June 30, 2017 22:25

JeffBezanson force-pushed the jb/typemap branch from 5002e0e to 28c1432 Compare July 11, 2017 19:39

JeffBezanson force-pushed the jb/typemap branch 2 times, most recently from 74a9a44 to 4146abc Compare July 12, 2017 03:00

vtjnash reviewed Jul 12, 2017

View reviewed changes

JeffBezanson force-pushed the jb/typemap branch from 4146abc to 63e1d9b Compare July 13, 2017 18:28

speed up dispatch of functions with type parameters

4961058

This restores the ability to start splitting typemaps at either argument 0 or 1, depending on whether the function type has parameters.

JeffBezanson force-pushed the jb/typemap branch from 63e1d9b to 6c673eb Compare July 14, 2017 18:46

speed up 0-arg constructor dispatch. fixes #21730

ecff624

Uses the `any` cache to skip all non-leaf slots when a later slot is splittable.

JeffBezanson force-pushed the jb/typemap branch from 6c673eb to ecff624 Compare July 14, 2017 18:47

vtjnash reviewed Jul 14, 2017

View reviewed changes

JeffBezanson mentioned this pull request Feb 16, 2019

speed up dispatch on parameterized function types #31089

Merged

JeffBezanson mentioned this pull request May 1, 2019

v0.5 "cannot add methods to an abstract type" when overriding call #14919

Closed

JeffBezanson mentioned this pull request May 10, 2019

gf: support more dispatch on abstract types #31916

Merged

vtjnash mentioned this pull request Aug 3, 2019

TypeMap: Cease trying to guarantee sorting #32776

Merged

vtjnash closed this Feb 20, 2020

DilumAluthge deleted the jb/typemap branch March 25, 2021 22:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dispatch speedups #21760

dispatch speedups #21760

JeffBezanson commented May 9, 2017

KristofferC commented May 9, 2017

JeffBezanson commented May 9, 2017

nanosoldier commented May 9, 2017

JeffBezanson commented May 17, 2017

vtjnash commented May 17, 2017

JeffBezanson commented May 17, 2017

vtjnash commented May 17, 2017

JeffBezanson commented May 17, 2017

KristofferC commented May 26, 2017

JeffBezanson commented Jul 1, 2017

vtjnash commented Jul 1, 2017

JeffBezanson commented Jul 1, 2017

JeffBezanson commented Jul 11, 2017

vtjnash left a comment

vtjnash Jul 12, 2017

vtjnash Jul 12, 2017

JeffBezanson Jul 12, 2017

vtjnash Jul 12, 2017

JeffBezanson Jul 12, 2017

vtjnash Jul 12, 2017

JeffBezanson Jul 12, 2017

JeffBezanson commented Jul 12, 2017 •

edited

Loading

JeffBezanson commented Jul 14, 2017

vtjnash Jul 14, 2017

vtjnash Jul 14, 2017

vtjnash Jul 14, 2017

vtjnash Jul 14, 2017

vtjnash commented Oct 12, 2017

StefanKarpinski commented Oct 12, 2017

		@@ -1056,7 +1068,7 @@ jl_typemap_entry_t jl_typemap_insert(union jl_typemap_t cache, jl_value_t *par
		newrec->isleafsig = newrec->issimplesig = 0;
		}

dispatch speedups #21760

dispatch speedups #21760

Conversation

JeffBezanson commented May 9, 2017

KristofferC commented May 9, 2017

JeffBezanson commented May 9, 2017

nanosoldier commented May 9, 2017

JeffBezanson commented May 17, 2017

vtjnash commented May 17, 2017

JeffBezanson commented May 17, 2017

vtjnash commented May 17, 2017

JeffBezanson commented May 17, 2017

KristofferC commented May 26, 2017

JeffBezanson commented Jul 1, 2017

vtjnash commented Jul 1, 2017

JeffBezanson commented Jul 1, 2017

JeffBezanson commented Jul 11, 2017

vtjnash left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Jul 12, 2017 • edited Loading

JeffBezanson commented Jul 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash commented Oct 12, 2017

StefanKarpinski commented Oct 12, 2017

JeffBezanson commented Jul 12, 2017 •

edited

Loading