Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isempty(::EachLine) expends a line #27412

Closed
oxinabox opened this issue Jun 4, 2018 · 6 comments
Closed

isempty(::EachLine) expends a line #27412

oxinabox opened this issue Jun 4, 2018 · 6 comments
Labels
io Involving the I/O subsystem: libuv, read, write, etc. iteration Involves iteration or the iteration protocol

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Jun 4, 2018

Eachline is (almost?) always a stateful iterator.
This means the default isempty which called iterate will consume a line if there is one to consume.

Sometimes (always?) in the use of Eachline (but not more generally), a defensive deepcopy can protect against that.

This is in 0.7
I feel like I (or someone else) opened a similar issue for this in 0.6
but I could be mis-remembering.

Downside of this fix, is that I'm not sure it works for all uses of Eachline.
It works for IOBuffer (and I think IOStream) because there is a pointer integer that is copied.
But if it was consuming unbuffered network data, then not possible because what is gone is gone.
(idk that it is possible to access unbuffered network data in julia)

Minimum Breaking Example

julia> data = IOBuffer("A\nB\nC\nD")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1)

julia> itr  = eachline(data)
Base.EachLine(IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1), getfield(Base, Symbol("##289#292"))(), false)

julia> isempty(itr)
false

julia> collect(itr)
3-element Array{String,1}:
 "B"
 "C"
 "D"

Defensive-copy work-around

julia> data = IOBuffer("A\nB\nC\nD")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1)

julia> itr = eachline(data)
Base.EachLine(IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=7, maxsize=Inf, ptr=1, mark=-1), getfield(Base, Symbol("##289#292"))(), false)

julia> isempty(deepcopy(itr))
false

julia> collect(itr)
4-element Array{String,1}:
 "A"
 "B"
 "C"
 "D"
@oxinabox
Copy link
Contributor Author

oxinabox commented Jun 4, 2018

When discussing this on slack more broadly.
I suggested that perhaps we should have a trait for
IsStateful

Such a trait could be tri-valued

  • NotStateful, so just running iterate would work
  • InternallyStateful, where the state of the iterator is real but it is visible to julia, so defensive copy would work.
  • ExternallyStateful where the state is not visible to julia so deepcopy will not work, so any call to things like isempty should error.

And that trait would determine which default implementation of isempty etc one received
Similar for tee (do we have tee? It turns one iterator into two)

@haampie
Copy link
Contributor

haampie commented Jun 4, 2018

Another defensive approach is to make isempty error by default; I don't think deepcopy is suitable for iterators that contain for instance huge vectors.

About isempty for eachline: maybe we should not have it implemented for this iterator at all, but rather only for a stateful wrapper of this iterator that caches the current line (i.e. the current Stateful iterator).

@haampie
Copy link
Contributor

haampie commented Jun 4, 2018

Another benefit is that this allows a trivial implementation of zip for iterators that are not stateful; the current implementation is quite convoluted as it must assume stateful iterators, and I have seen examples where the compiler produces suboptimal code. (The issue is that it has to call isdone on all iterators, followed by iterate on all iterators, but it still has to check whether the iterators return nothing or not.)

@oxinabox oxinabox changed the title isempty(::EachLine) should probably do a defensive copy isempty(::EachLine) expends a line Jun 4, 2018
@JeffBezanson
Copy link
Sponsor Member

To copy the state of a stateful iterator, copy should suffice; deepcopy should not be necessary.

Just a bandaid, but we could add an isempty method for EachLine that just calls eof.

@JeffBezanson JeffBezanson added io Involving the I/O subsystem: libuv, read, write, etc. iteration Involves iteration or the iteration protocol labels Jun 12, 2018
cmcaine added a commit to cmcaine/julia that referenced this issue Feb 7, 2021
With this method, `isempty` won't consume values from the iterator (See JuliaLang#27412).
@cmcaine
Copy link
Contributor

cmcaine commented Feb 7, 2021

This bit me again today. What other stateful iterators are there in Julia? Maybe we can just add bandaids for most of them?

From the docs, it looks like we should just add Base.isdone(itr::Eachline) = eof(itr.stream) ? I've opened a PR (#39562) that does that.

Agree with traits or similar so code can identify stateful iterators.

cmcaine added a commit to cmcaine/julia that referenced this issue Feb 7, 2021
With this method, `isempty` won't consume values from the iterator (See JuliaLang#27412).
vtjnash pushed a commit that referenced this issue Feb 8, 2021
With this method, `isempty` won't consume values from the iterator (See #27412).
@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 8, 2021

Fixed by #39562 in ede6486

@vtjnash vtjnash closed this as completed Feb 8, 2021
cmcaine added a commit to cmcaine/julia that referenced this issue Feb 14, 2021
This prevents `isempty` and friends from advancing the iterator when
they shouldn't.

Note: `collect(readeach(io, T))` can throw an exception if `io` isn't a
multiple of the right size. `isempty` will still return `false`, even if
the next call to `iterate` will throw an exception.

Similar to: JuliaLang#27412
Keno pushed a commit that referenced this issue Feb 15, 2021
This prevents `isempty` and friends from advancing the iterator when
they shouldn't.

Note: `collect(readeach(io, T))` can throw an exception if `io` isn't a
multiple of the right size. `isempty` will still return `false`, even if
the next call to `iterate` will throw an exception.

Similar to: #27412
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
With this method, `isempty` won't consume values from the iterator (See JuliaLang#27412).
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
This prevents `isempty` and friends from advancing the iterator when
they shouldn't.

Note: `collect(readeach(io, T))` can throw an exception if `io` isn't a
multiple of the right size. `isempty` will still return `false`, even if
the next call to `iterate` will throw an exception.

Similar to: JuliaLang#27412
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
With this method, `isempty` won't consume values from the iterator (See JuliaLang#27412).
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this issue May 9, 2021
This prevents `isempty` and friends from advancing the iterator when
they shouldn't.

Note: `collect(readeach(io, T))` can throw an exception if `io` isn't a
multiple of the right size. `isempty` will still return `false`, even if
the next call to `iterate` will throw an exception.

Similar to: JuliaLang#27412
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc. iteration Involves iteration or the iteration protocol
Projects
None yet
Development

No branches or pull requests

5 participants