Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precision about stateful iterators to isempty #43900

Merged
merged 3 commits into from
Mar 2, 2022
Merged

Conversation

nalimilan
Copy link
Member

The manual now (#43099) mentions that stateful iterators should implement isdone, so it appears isempty is not supposed to consume elements.
This matters a lot for callers as if you can't rely on this you can't do isempty(c) && return; for x in c but instead have to call iterate(c) manually.

Maybe the "should" in the manual and the docstring should be "must" instead?

The manual now mentions that stateful iterators should implement `isdone`,
so it appears `isempty` is not supposed to consume elements.
This matters a lot for callers as if you can't rely on this you can't do
`isempty(c) && return; for x in c` but instead have to call `iterate(c)` manually.
@nalimilan nalimilan added domain:docs This change adds or pertains to documentation domain:collections Data structures holding multiple items, e.g. sets labels Jan 23, 2022
@nalimilan nalimilan requested a review from Keno January 23, 2022 10:15
@nalimilan
Copy link
Member Author

Anybody willing to review this?

@tkf
Copy link
Member

tkf commented Feb 20, 2022

I think the docstring is correct depending on what you exactly mean by "guaranteed." For example,

julia> ch = foldl(push!, 1:3; init = Channel{Int}(Inf))
Channel{Int64}(9223372036854775807) (3 items available)

julia> close(ch)

julia> itr1 = (x + 1 for x in ch);

julia> isempty(itr1)
false

julia> collect(itr1)
2-element Vector{Int64}:
 3
 4

i.e., an iterator transform wrapping a stateful iterator still mutates the stateful iterator. It is fixable by forwarding isdone. But it's impossible to do for a similar example with Iterators.filter.

Another tricky example is that user-defined function is not pure:

julia> COUNTER = 3
       itr2 = Iterators.takewhile(1:3) do _
           global COUNTER
           (COUNTER -= 1) > 0
       end;

julia> collect(itr2)
2-element Vector{Int64}:
 1
 2

julia> COUNTER = 3
       itr2 = Iterators.takewhile(1:3) do _
           global COUNTER
           (COUNTER -= 1) > 0
       end;

julia> isempty(itr2)
false

julia> collect(itr2)
1-element Vector{Int64}:
 1

I think we can put "guaranteed" in the docstring by declaring sufficiently many things as undefined behavior (i.e., violation of precondition is not detected).

That said, I think the "look before you leap" approach of using isempty is essentially broken for generic algorithms on iterables. Not only defining it for stateful iterators is tricky but also it's not O(1) for arbitrary stateless iterables like Iterators.filter. So, isempty probably should only be used for materizlied collections and some specific subsets of iterators (e.g., Iterators.Stateful). Rather, I think we should just advocate the "easier to ask for forgiveness than permission" approach of using iterate to dynamically detect the emptiness. Of course, tooling around this approach in Base is very limited. But this is one of the missions of JuliaFolds.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 20, 2022

Those are some of the literal defining examples of why we just have iterate now, and no longer use start/next/done for the iteration protocol

@nalimilan
Copy link
Member Author

So maybe we should rather say that calling isempty may consume elements? Then we could list iterators for which it is guaranteed not to be the case. (FWIW, I wrote this PR because I couldn't find a place where it was clearly mentioned what I could expect from isempty.)

The difficulty is that isdone initially wasn't defined for most stateful iterators, but methods have been added progressively without a clear definition about what is expected of custom iterators.

Cc: @cmcaine

@cmcaine
Copy link
Contributor

cmcaine commented Feb 20, 2022

I don't think the docstring should say that this is a guarantee because there are too many iterables that break the guarantee. I think we should instead give a warning, something like this:

Warning: isempty may consume the next element of a stateful iterator if that iterator does not define isdone.

It's true that the general approach of isempty is unsuited to many iterators. If JuliaFolds or another project comes up with a good answer for this hopefully we can reuse it in Base or mention it in the manual.

@cmcaine
Copy link
Contributor

cmcaine commented Feb 20, 2022

And maybe we should fix isdone for channels? It should probably block until something is in the channel or the channel is closed.

@cmcaine
Copy link
Contributor

cmcaine commented Feb 20, 2022

And Base.Generator should forward the isdone method, too. It can't help for Generators with a guard: (x for x in eachline("foo.txt") if x ≠ ""), but still, better than nothing.

The idea of my extra isdone definitions is to be pragmatic and make it harder for people to cut their fingers on isempty (and zip?) even if it's not a great pattern overall.

@tkf
Copy link
Member

tkf commented Feb 21, 2022

So maybe we should rather say that calling isempty may consume elements?

Yes, otherwise we need to remove iterate-based isempty method. Both options are technically non-breaking because isempty is very underspecified. But my guess is that triage would consider that removing iterate-based implementation is practically too breaking.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 22, 2022

It can't help for Generators with a guard

It is permitted to throw, if it is non-computable, or just return false unconditionally (getting true is a promise that the iterator will never produce an other element from iterate, getting false means nothing about the next call to iterate)

@cmcaine
Copy link
Contributor

cmcaine commented Feb 22, 2022

What is the missing return value of isdone for if false means "dunno"?

Edit: I was imagining that isdone would return the default value (missing) if the Generator has a guard, because guards are implemented with Iterators.Filter and that doesn't have an isdone method.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 22, 2022

When missing, it means it could be determined (by calling iterate), but is expensive to compute. The false means it may not be determined (since someone could take the element before you).

@cmcaine
Copy link
Contributor

cmcaine commented Feb 22, 2022

Hmm. That's not how isempty uses it.

function isempty(itr)
    d = isdone(itr)
    d !== missing && return d
    iterate(itr) === nothing
end

Here an iterable of indeterminate done-ness is declared empty.

If isdone(x) === false indicates an indeterminate state then the above code should probably be:

function isempty(itr)
    d = isdone(itr)
    d === true && return d
    iterate(itr) === nothing
end

And the docs for isdone seem to say that false means the iterable is not done:

Mutable iterators that want to opt into this feature should define an isdone method that returns true/false depending on whether the iterator is done or not.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 22, 2022

That does not fix the stated problem, but makes it worse. When something is mutable, you cannot know if someone else mutates it, so you cannot know if it becomes empty after you checked.

base/essentials.jl Outdated Show resolved Hide resolved
@nalimilan nalimilan requested review from tkf and vtjnash March 2, 2022 08:11
@vtjnash vtjnash merged commit b10aa56 into master Mar 2, 2022
@vtjnash vtjnash deleted the nl/isempty branch March 2, 2022 17:48
@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 2, 2022

I will note though that if you come across a case where a mutable iterator changes here, that should be fixed by adding an isdone method, and PRs are welcome for any cases you find.

staticfloat pushed a commit to JuliaCI/julia-buildkite-testing that referenced this pull request Mar 2, 2022
The manual now mentions that stateful iterators should implement `isdone`,
so it appears `isempty` is not supposed to consume elements.
This matters a lot for callers as if you can't rely on this you can't do
`isempty(c) && return; for x in c` but instead have to call `iterate(c)` manually.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets domain:docs This change adds or pertains to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants