Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterators that know their size after start has been called #22467

Open
davidanthoff opened this issue Jun 21, 2017 · 9 comments
Open

Iterators that know their size after start has been called #22467

davidanthoff opened this issue Jun 21, 2017 · 9 comments
Labels
domain:collections Data structures holding multiple items, e.g. sets

Comments

@davidanthoff
Copy link
Contributor

Another issue that has come up during the Query.jl design: I have a whole bunch of iterators that know their length after the start method has been called.

Would it be possible to add another return value to iteratorsize that is HasLengthAfterStart(), and if a type returns that it has to implement length(source, state)?

@tkelman
Copy link
Contributor

tkelman commented Jun 22, 2017

See #8149 and #18823

@davidanthoff
Copy link
Contributor Author

See #8149 and #18823

I assume this was just meant for cross-reference? Neither issue proposes something that would address the issue here.

@kshyatt kshyatt added the domain:collections Data structures holding multiple items, e.g. sets label Jun 23, 2017
@mschauer
Copy link
Contributor

mschauer commented Jul 6, 2017

Related: #16708

@laborg
Copy link
Contributor

laborg commented Oct 17, 2023

@davidanthoff can't this be handled with implementation of a stateful iterator? In the new protocol iterate(x) could then have your desired side effect.

@KristofferC
Copy link
Sponsor Member

I think the new iteration protocol handles this yes.

@davidanthoff
Copy link
Contributor Author

Hm, I might be missing something, but I don't think this is addressed with the new protocol? How would a source indicate to a client that length can be called after the first call to iterate?

If a source returns Base.HasLength from IteratorSize, then length has to work without a call to iterate. If it returns SizeUnknown then a client really has to assume that length can never be called. Neither case seems to cover what I'm after.

https://github.com/queryverse/IteratorInterfaceExtensions.jl#iteratorsize2 has an implementation of what I'm suggesting. That works OK for now, as this is essentially just used to trigger a performance optimization, but I still think it would make sense to have this in base itself.

@davidanthoff davidanthoff reopened this Oct 17, 2023
@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 17, 2023

The contract for length is that it does not change during iterate, so it seems odd that calling iterate would make it available when it was not before

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 17, 2023

FWIW though, I think the iteration protocol already expects that iterate has been called at least once before length is called for Base.HasLength, so that is already the expected definition for it

@davidanthoff
Copy link
Contributor Author

The contract for length is that it does not change during iterate, so it seems odd that calling iterate would make it available when it was not before

The proposal here is not that length returns something different after iterate is called or that the return value would change during iteration. The proposal is that a source can signal to a client that length should not be called until iterate has been called once, i.e. it really is more of a signal that length is undefined behavior until a certain point in time.

WIW though, I think the iteration protocol already expects that iterate has been called at least once before length is called for Base.HasLength, so that is already the expected definition for it

Really? I certainly would not have guessed that at all from looking at the docs. Also, just very briefly looking through Julia base code, that does not seem to be how iterators are used, for example the code here would then be an incorrect consumption of an iterator, right?

Wouldn't that also be a really odd interpretation with the current stateless design of iterators? If length(iter) was only valid after a call to iterate(iter), then that would bake a mutating design into the iteration protocol that seems a bid odd? That is why in IteratorInterfaceExtensions.jl I added a new method signature for length that is length(iter, x), where x is the iteration state, for this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

7 participants