Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julep: Revised iteration protocol #18823

Closed
Keno opened this issue Oct 6, 2016 · 32 comments
Closed

Julep: Revised iteration protocol #18823

Keno opened this issue Oct 6, 2016 · 32 comments
Assignees
Labels
design Design of APIs or of the language itself kind:breaking This change will break code kind:julep Julia Enhancement Proposal
Milestone

Comments

@Keno
Copy link
Member

Keno commented Oct 6, 2016

We discussed at JuliaCon what to do about #8149. I'm opening this issue to capture the specific design proposed at JuliaCon (and so that others can make corrections about things I misremembered).

The new iteration protocol

In the new iteration protocol, there's only one function to be overloaded:

step(it, state=...)::Nullable

with for loops getting lowered as:

for i in it
    #= loop body =#
end

equivalent to

x = step(it)
while !isnull(x)
    i, state = unsafe_get(x)
    #= loop body =#
    x = step(it, state)
end

Implementation

This takes care of iterators that don't know whether they're done until trying (just return Nullable{typeof(state}()). One concern with this approach (and approaches like it) was whether there would be an extra allocation and branch, which would be an unacceptable performance regression. (Added in EDIT by @timholy: #16035 (comment) and #9080.) The conclusion we came to was that the compiler should be able to optimize this out. To make sure that this happens, we may want to use a slightly different lowering, such as:

1:
x = step(it)
isnull(x) ? goto 4 : goto 3
2:
x = step(it, state)
isnull(x) ? goto 4 : goto 3
3:
#= loop body =#
goto 2:
4:
#= rest of function =#

The motivation for structuring things that way, is that simple step functions where performance really matters probably look something like:

function step(it, state)
if state > length(it)
Nullable{Tuple{T,Int}}()
else
Nullable{Tuple{T,Int}}((it[state], state+1))
end
end

which after inlining, should make this a fairly straightforward jump threading optimization (if the optimization doesn't happen we should figure out why and fix that).

Migration

Another nice property of this proposal is that it's backwards compatible with the existing scheme. The fallback method is simply:

@inline function step(it,state=start(it))
done(it) ? Nullable{Tuple{Any,typeof(state)}}() : Nullable{Tuple{Any, typeof(state)}}(next(it, state))
end

Now, the Nullable{Any,} in there is not ideal, but forced inlining together with allocation elision should be able to handle that just fine (otherwise we could also do Nullable{eltype(it),...}, which should work also, but has slightly greater risk of breaking).

cc @timholy @JeffBezanson - Please add anything I missed

@Keno Keno added the kind:julep Julia Enhancement Proposal label Oct 6, 2016
@Keno Keno added this to the 0.6.0 milestone Oct 6, 2016
@kmsquire
Copy link
Member

kmsquire commented Oct 6, 2016

👍

I think your conditions in your lowering should be

isnull(x) ? goto 4 : goto 3

@Keno
Copy link
Member Author

Keno commented Oct 6, 2016

Yes, you're right. Fixing.

@JeffBezanson
Copy link
Sponsor Member

Needing to write the type of the state that would be returned is a slight problem. Returning Nullable() could work ok, since the result of get(::Union{Nullable{Int},Nullable{Union{}}}) is easy to infer, but this still adds an API issue of whether the parameter of the Nullable is reliable.

@Keno
Copy link
Member Author

Keno commented Oct 7, 2016

but this still adds an API issue of whether the parameter of the Nullable is reliable.

Since the nullable is not really visible to users of the API, maybe the answer is no?

@Keno
Copy link
Member Author

Keno commented Oct 7, 2016

Though if we're willing to go the type unstable way, we could just use the empty tuple to indicate the end and avoid having to declare type parameters entirely

@nalimilan
Copy link
Member

A small question/comment: I think Nullable(it[state], state+1) is supposed to be Nullable((it[state], state+1)) and Nullable{Any,typeof(state)}() is supposed to be Nullable{Tuple{Any,typeof(state)}}(), right?

As regards type stability, you can always declare the method's return type via ::Nullable{Tuple{T, Int}}. This allows return Nullable() to do the conversion automatically. Also, you don't need to give the type parameters in Nullable((it[state], state+1)). So overall you only have to declare them once (in the signature).

Finally, @davidagold recently added an unsafe_get function which skips the isnull check when you've already done it manually. Could be useful if the compiler isn't able to skip the redundant second check (I haven't verified whether that's the case in your example).

@nalimilan nalimilan added the domain:missing data Base.missing and related functionality label Oct 7, 2016
@Keno
Copy link
Member Author

Keno commented Oct 7, 2016

Corrections made. Yes, the return type annotation is a good point. But still, having to spell out that type is annoying.

@Keno
Copy link
Member Author

Keno commented Oct 13, 2016

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead. In that case,
the step function would return Union{Void,Tuple{Element,State}}. One of the major considerations there is the ease of actually writing these functions. Having to explicitly write the element types is very onerous.

That proposal would work especially nicely with the proposed syntax for a &&-alike that returns nothing in the else case. Examples:

step(a::Vector, i=0) = i < length(a) ?? (a[i+1], i+1)

step(a::Range1) = (first(a),first(a))
step(a::Range1, i) = (i!=last(a)) ?? (i+1,i+1)

As a side note, one interesting aspect of this proposal is that the state now is the index of the last access element rather than the state of the next element as it was with the old iteration protocol.

@StefanKarpinski
Copy link
Sponsor Member

Advantage: never having to return invalid state.

@timholy
Copy link
Sponsor Member

timholy commented Oct 13, 2016

Thanks for writing this up! Would we need Any for Base.HasEltype iterators?

@nalimilan
Copy link
Member

Upon further discussion, it seems that we may be ok with having step be fundamentally type unstable and improving the compiler to the point where we don't feel the overhead.

Fast enough not to hinder vectorization? For example, things like this use SIMD instructions currently, and it would be too bad to lose this:

function mysum(X::AbstractArray)
   s = zero(eltype(X))
   @inbounds for x in X
       s += x
   end
   s
 end

@nalimilan
Copy link
Member

FWIW another approach to avoid the need for explicit return-type declarations would be to have a function-wise declaration which would apply to all methods:

function step{T}(itr, state::T)::Nullable{Tuple{eltype(itr), T}} end

Cf. @StefanKarpinski's recent post.

@Keno
Copy link
Member Author

Keno commented Oct 13, 2016

That doesn't work because the state may change type.

@tkelman
Copy link
Contributor

tkelman commented Oct 26, 2016

cross-referencing the related #16878

@tkelman
Copy link
Contributor

tkelman commented Dec 15, 2016

Unlikely to happen by end-of-year feature freeze?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 6, 2017

I think sometime shortly after v0.6 is branched we can go ahead and implement this Julep:

julia> @inline iterate(x) = next(x, start(x))
iterate (generic function with 1 method)

julia> @inline iterate(x, i) = done(x, i) ? done : next(x, i)
iterate (generic function with 2 methods)

Where the lowering of a for-loop would look like:

let state = iterate(x)
    while !(state === done)
        body(state[1])
        state = iterate(x, state[2])
    end
end

The codegen for this is already looking pretty good.

@timholy
Copy link
Sponsor Member

timholy commented Apr 6, 2017

That's awesome. I noted a couple of days ago that #9080 also seems less severe (but still not perfect), and specifically that the test case in #16035 (comment) seems fixed. (With recent PRs, the situation might be even better still.)

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Apr 6, 2017

Returning the done function is not a good way to express this, but it's great that the new union code gen can handle this efficiently. I think this is a case where we need to use a Nullable not for performance, but because it's the only way that iteration can return any possible value. It would be very strange if the array collect([start, done, next]) == [start].

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 6, 2017

This handles that case just fine (return values of done or Pair{state, value}), while wrapping it in a Nullable wouldn't actually work (we can't infer it as well and would usually generate worse code). I agree that punning on the done singleton isn't necessary. It may be clearer to make a new explicit type.

@StefanKarpinski
Copy link
Sponsor Member

Right, the terminal value is a state object, not a yielded object. So I guess this pattern would imply that the done object can never be used as a state value, but that's an acceptable restriction.

@StefanKarpinski
Copy link
Sponsor Member

Why not just use nothing since any valid value would have to be a tuple, which nothing is not?

@JeffBezanson
Copy link
Sponsor Member

nothing would be my default choice, but there is some worry about returning it accidentally.

@JeffBezanson
Copy link
Sponsor Member

To test this design I'd like to see implementations of the product, flatten, and filter iterators --- those being some of the trickiest and most important. Ideally we could get nicer code and better performance.

@JeffBezanson JeffBezanson self-assigned this Jul 27, 2017
@JeffBezanson
Copy link
Sponsor Member

And, looks like I just volunteered to do that :)

@Keno Keno assigned JeffBezanson and unassigned JeffBezanson Jul 27, 2017
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 3, 2017

I tried this, and ran into a current show-stopping optimization miss. It appears we will need some way to elide allocation of the tuple in types like: Union{Tuple{String, Int64}, Void} to make this feasible. Is that going to be possible?

@StephenVavasis
Copy link
Contributor

Sorry to reopen this issue: Could one of the core developers post the appropriate if VERSION >= ... statement to be used in libraries that want to support both the old and new iteration protocol.

@Keno
Copy link
Member Author

Keno commented May 20, 2018

You can use contrib/commit-name.sh to find the version number for a commit hash. In this case, the iteration protocol merge is 0.7.0-DEV.5126.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself kind:breaking This change will break code kind:julep Julia Enhancement Proposal
Projects
None yet
Development

No branches or pull requests