Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatch speedups #21760

Closed
wants to merge 2 commits into from
Closed

dispatch speedups #21760

wants to merge 2 commits into from

Conversation

JeffBezanson
Copy link
Sponsor Member

Test script: https://gist.github.com/JeffBezanson/a5f4abd6f093795f7d8c41501fb94d8b

release-0.6:

  0.026056 seconds (10.00 k allocations: 156.250 KiB)
  0.061410 seconds
  0.001493 seconds (10.00 k allocations: 156.250 KiB)
  0.081548 seconds (489 allocations: 7.641 KiB)
  0.060909 seconds (87 allocations: 6.547 KiB)
  0.060998 seconds (87 allocations: 6.547 KiB)

This PR:

  0.002621 seconds (10.00 k allocations: 156.250 KiB)
  0.005154 seconds
  0.001432 seconds (10.00 k allocations: 156.250 KiB)
  0.002044 seconds (489 allocations: 7.641 KiB)
  0.000898 seconds (87 allocations: 6.547 KiB)
  0.000948 seconds (87 allocations: 6.547 KiB)

The first result in the list is probably just luck from changing the order of the table. Speeding up convert dispatch is not really solved here; I managed to come up with a hack that only works for constructors so far.

The fix for #21370 sacrificed 0-argument constructors (#21730) and functions whose types have parameters. This hopefully fixes that, while keeping the fix for #21370.

@nanosoldier runbenchmarks(ALL, vs=":master")

@JeffBezanson JeffBezanson added the performance Must go faster label May 9, 2017
@KristofferC
Copy link
Sponsor Member

Would these benchmarks be suitable to add to Nanosoldier?

@JeffBezanson
Copy link
Sponsor Member Author

Yes, I don't see why not. Let's add them.

@KristofferC KristofferC added the kind:potential benchmark Could make a good benchmark in BaseBenchmarks label May 9, 2017
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@JeffBezanson JeffBezanson requested a review from vtjnash May 10, 2017 19:02
@JeffBezanson
Copy link
Sponsor Member Author

Ping @vtjnash --- what do you think?

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 17, 2017

This feels unlikely to be valid, but I'll have to work a bit harder to understand it to see if I can create a good counterexample.

@JeffBezanson
Copy link
Sponsor Member Author

It actually ended up being really simple. First I restored using either 0 or 1 as the offset based on whether the function type has parameters. Hopefully that's uncontroversial.

Next, I observed that the ->any table is never used for the 0th argument, because you can't have Any in that slot. So I use it to split Type{T}s based on whether T is_cache_leaf. If it is, it gets inserted as before starting with offset 0. Otherwise, the offset is incremented to 1 and it's inserted into the ->any table. Both are checked on lookup (as before). This concept seems sound to me.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 17, 2017

because you can't have Any in that slot

This is true, but actually irrelevant. The question is whether qualifying for the split guarantees it is more specific than anything in the sorted list. The offset = 0 test also seems to be a red-herring – it likely should be valid regardless of offset. If that's true, you can just introduce this universally as a new field. I think we also need to double-check that this will handle Varargs correctly.

@JeffBezanson
Copy link
Sponsor Member Author

Yes, I did try to make this work more generally at any offset, but couldn't get it working since I didn't want to add another field to typemaps. I'm not sure we want to add another field, especially if this is backported to 0.6 which would be nice.

@KristofferC
Copy link
Sponsor Member

Bump, would be nice to get this in 0.6 if it is determined to be sound.

@JeffBezanson
Copy link
Sponsor Member Author

@vtjnash bump. Maybe we can put this on master and see how it goes?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 1, 2017

I'm still fairly certain that aaaf9f4 violates specificity ordering. The other two commits look ok.

I'm not sure we want to add another field

Since you're using one field to mean two different things, the question isn't whether we add another field to disambiguate those, it's whether it would be valid to remove the offs == 0 test (even if we end up deciding to keep that condition in place). If the system still works without that condition, then that commit may be OK. Otherwise that commit is likely invalid.

@JeffBezanson
Copy link
Sponsor Member Author

I'll separate the two safer commits and then explore this.

@JeffBezanson
Copy link
Sponsor Member Author

Ok, I came up with a different approach that's simpler and more general. I changed the meaning of the any field: instead of using it to skip slots equal to Any, it's used to skip all non-leaf slots if some later slot is a cache_leaf. This means each level is neatly split into three cases:

  1. Leaf at this offset; use hash splitting.
  2. Leaf at some later offset; use any.
  3. No remaining leaf types, use linear.

Those cases should be correctly ordered, most specific first. If this checks out, the field should perhaps be renamed skip.

@JeffBezanson JeffBezanson force-pushed the jb/typemap branch 2 times, most recently from 74a9a44 to 4146abc Compare July 12, 2017 03:00
Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem tantalizingly close to being usable. But as you noted in the code comment, it doesn't work because of Vararg. Is it going to be possible to resolve that?

Also, can you split out the other commits into a new PR. As much as I love re-reviewing code, it does get a bit hard to keep track of which parts of the diff are relevant to the new proposed optimization.

src/typemap.c Outdated
assert(offs != lastleaf);
if (tparams->unsorted) {
// if we couldn't split on this offset but can split on a later one, skip this slot.
// Only do this for sorted maps, since there are cases where an apparently non-leaf
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsorted

src/typemap.c Outdated
jl_value_t *ttypes = jl_unwrap_unionall((jl_value_t*)types);
int offs, l = jl_field_count(ttypes);

for(offs = l-1; offs >= 0; offs--) {
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with it?

src/gf.c Outdated
@@ -126,7 +126,7 @@ const struct jl_typemap_info method_defs = {
0, &jl_method_type
};
const struct jl_typemap_info lambda_cache = {
0, &jl_method_instance_type
1, &jl_method_instance_type
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, when reviewing earlier I mistook which cache this was and thought it was the tfunc / specializations cache. The lambda (method table) cache requires sorting.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. but it surely doesn't require sorting in the same sense --- this doesn't cause any test failures. We might need another flag telling the typemap that only simple signatures will be inserted (i.e. the stuff we put in method caches, basically leaf types and Anys).

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typemap figures that out on its own already and by-passes sorting when not applicable. I'm not too surprised that our tests don't manage to hit this case – we have a few other sorting bugs that I suspect we would usually hit first, limiting our ability to demonstrate the sorted-ness of this cache.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe such a flag would enable the optimization I'm attempting here. It's only safe to split on the second argument of (::AbstractThing, ::Int) if we know (::AbstractThing{N}, ::Vararg{Int,N}) is not going to exist.

@JeffBezanson
Copy link
Sponsor Member Author

JeffBezanson commented Jul 12, 2017

Just look at the newest commit (speed up 0-arg constructor dispatch). The others are unchanged, and that one is all new.

I think the solution is to use this trick only for unsorted maps (EDIT: or method caches, which use a highly restricted set of types). Method caches are far, far more performance-critical than sorted method lists, and I highly doubt they will ever be able to handle anything as complex as (a::ConjArray{T,N,A}, i::Vararg{Int64,N}) where {T, N}.

This restores the ability to start splitting typemaps at either argument 0
or 1, depending on whether the function type has parameters.
@JeffBezanson
Copy link
Sponsor Member Author

@vtjnash : I removed the change to the unsorted flag in lambda_cache, and added the flag I described. What do you think? This speedup would be nice...

Uses the `any` cache to skip all non-leaf slots when a later slot
is splittable.
return 0;
return jl_typemap_visitor(cache.node->any, fptr, closure);
return jl_typemap_node_visitor(cache.node->linear, fptr, closure);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't change the visitation order unless you know that jl_typemap_info->simplekeys value is set. This method defines the sort order for MethodTable.

return 0;
return jl_typemap_intersection_visitor(map.node->any, offs+1, closure);
return jl_typemap_intersection_node_visitor(map.node->linear, closure);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@@ -448,7 +448,7 @@ typedef struct _jl_typemap_level_t {
struct jl_ordereddict_t arg1;
struct jl_ordereddict_t targ;
jl_typemap_entry_t *linear; // union jl_typemap_t (but no more levels)
union jl_typemap_t any; // type at offs is Any
union jl_typemap_t any; // type at offs is skipped; will be split later
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're changing the meaning of this field, can you reorder this list to show the sort order and rename it something else.

I think I would still like to keep the any field also, as a possible means of splitting Method tables for incremental deserialization. Scanning these tables is currently almost all of the cost there. There's fairly few instances of this type (probably a couple thousand tops), and they're already usually expected to be gigantic (several hundred to several thousand bytes).

@@ -1056,7 +1068,7 @@ jl_typemap_entry_t *jl_typemap_insert(union jl_typemap_t *cache, jl_value_t *par
newrec->isleafsig = newrec->issimplesig = 0;
}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add an executable assertion on the meaning of simplekeys here:
assert((!simplekeys || newrec->issimplesig) && "bad insert")

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 12, 2017

Bump? I agree, this speedup would be nice :)

@StefanKarpinski
Copy link
Sponsor Member

Unless this is blocking something, I think we should wait until post 1.0 to merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:potential benchmark Could make a good benchmark in BaseBenchmarks performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants