Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a set-up step before iterate #46802

Closed
MasonProtter opened this issue Sep 16, 2022 · 11 comments
Closed

Add a set-up step before iterate #46802

MasonProtter opened this issue Sep 16, 2022 · 11 comments
Labels
domain:iteration Involves iteration or the iteration protocol

Comments

@MasonProtter
Copy link
Contributor

MasonProtter commented Sep 16, 2022

Currently, when one writes a loop over an iterator, it gets lowered from

for i in v
    ...
end

to the equivalent of

res = iterate(v)
while !isnothing(res)
    i, state = res
    ...
    res = iterate(v, state)
end

I wonder if there'd be any potential problems with generalizing this a bit by adding a function Base.to_iterator (or whatever) with a default method that is just identity, and then we could lower the loop down to

itr = Base.to_iterator(v) # <------ Setup step, default method just returns v
res = iterate(itr)
while !isnothing(res)
    i, state = res
    ...
    res = iterate(itr, state)
end

This should not change anything about existing iterators, but I believe that for certain cases, this would afford us some interesting optimization opportunities, and make the implementation of some iterators easier.

Some iteration protocols are just easier (or more efficient) to express in terms of a mutable object, but it can be wasteful to have to carry around a mutable container in your type to hold state just in case someone wants to iterate over it. In this way, if it's set up and torn down automatically, there are opportunities for the compiler to stack allocate all the mutable containers.


An example usecase:

Suppose I have the type

struct VectorOfVectors{T} <:AbstractVector{T}
    data::Vector{Vector{T}}
end

which I want to basically treat as a flat vector semantically.

Implementing iterate on this type is pretty painful, it requires juggling an inner state and an outer state for the outer vectors and the inner vectors, and trying to do this has a large chance of being wrong, or suboptimal. It's especially frustrating to write out this iterator, because Base.Iterators has already provided the wanted iteration protocol for this type: Iterators.flatten.

If we had a setup step like the one proposed, someone could just write

Base.to_iterator(vov::VectorOfVectors) = Iterators.flatten(vov)

and be done with it. Currently, it seems (thanks @vtjnash for pointing out this is possible) that the current best way to re-use the work already done in base would be to write something like this:

function Base.iterate(vov::VectorOfVectors)
    itr = Iterators.flatten(vov.data)
    res = iterate(itr)
    isnothing(res) && return nothing
    x, state = res
    return x, (itr, state)
end

function Base.iterate(vov::VectorOfVectors, (itr, state))
    res = iterate(itr, state)
    isnothing(res) && return nothing
    x, state = res
    return x, (itr, state)
end

which is considerably more work, and is just a bunch of pointless boiler-plate just to forward an iteration specification to an existing iterator.

@jakobnissen
Copy link
Contributor

As pointed out in Mike Innes' post Iterate on It, this also allows you to create a destructive iterator, then GC the original object.

@giordano giordano added the domain:iteration Involves iteration or the iteration protocol label Sep 16, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 16, 2022

It is unnecessary, since if you have a stateful iterator, you just return that as the state when making the first iteration step

@MasonProtter
Copy link
Contributor Author

Hm, that would make my original use-case somewhat unnecessary (although significantly uglier to implement), but it wouldn't fix the use-case from Mike's blog post (thanks @jakobnissen I had forgotten about this!).

If I used your trick @vtjnash with the state, there would still be a reference to the linked list throughout the whole loop, until it actually terminates, whereas if you use to_iterator to create a destructive stateful iterator, then v can be gc'd if it isn't used anymore after the loop.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 16, 2022

there is no reference if iterator does not use the argument

@MasonProtter
Copy link
Contributor Author

Does that rely on iterate inlining?

@Seelengrab
Copy link
Contributor

Isn't this sort of going back to the old iteration protocol..?

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Sep 19, 2022

Not really, no. It's just an optional hook that lets you choose what object you actually iterate over.

@jakobnissen
Copy link
Contributor

jakobnissen commented Sep 20, 2022

GC'ing appears to rely on inlining. In the following example:

mutable struct Foo
    x::Int

    function Foo(x::Int)
        y = new(x)
        println("Creating $(repr(objectid(y)))")
        finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
    end
end

Base.length(x::Foo) = x.x
Base.iterate(x::Foo) = (x.x, x.x-1)

@noinline function Base.iterate(x::Foo, state)
    iszero(state) && return nothing
    GC.gc()
    (state, state-1)
end

f(x) = foreach(println, Foo(x))

No GC is performed during the loop t.f(5), but it is when @noinline is removed. Further, it will not GC if Base.iterate is written as a single method with a default state argument, as is idiomatic.

IMO, if it was possible to GC in the un-inlined case, then Jameson is correct and there would be no need for a setup function.

@Keno
Copy link
Member

Keno commented Sep 21, 2022

Just wanted to point out that this was discussed at the time, so there's some extensive discussion here: #25261 (comment). In general, I'm not necessarily opposed to it, but I'm not sure the justification is all that strong, since I don't think it would let you implement anything that couldn't be implemented right now, while simultaneously imposing an extra method resolution on every single loop in the system (which isn't terrible, but should have some reason for it).

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Oct 20, 2022

GC'ing appears to rely on inlining. In the following example:

I'm running this on 1.8.2 and I do see the GC run in this example, even with @inline and a default argument, so I guess that solves it.

Edit: Nvm, I think I was doing something wrong.

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Oct 23, 2022

Fixed I guess with SetupIterator.jl

mutable struct Foo
    x::Int

    function Foo(x::Int)
        y = new(x)
        println("Creating $(repr(objectid(y)))")
        finalizer(i -> println("Destroying $(repr(objectid(i)))"), y)
    end
end
Base.length(x::Foo) = x.x
using SetupIterator
SetupIterator.iterator(f::Foo) = f.x:-1:1
@setup_iterate Foo
julia> function f(n)
           for x  Foo(n)
               GC.gc()
               @show x
           end
       end
f (generic function with 1 method)

julia> f(5)
Creating 0x7dbd75374892a811
Destroying 0x7dbd75374892a811
x = 5
x = 4
x = 3
x = 2
x = 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:iteration Involves iteration or the iteration protocol
Projects
None yet
Development

No branches or pull requests

6 participants