Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Flatten iterator #14805

Merged
merged 1 commit into from
Mar 24, 2016
Merged

add Flatten iterator #14805

merged 1 commit into from
Mar 24, 2016

Conversation

JeffBezanson
Copy link
Sponsor Member

Iterator Madness continues. This is essentially a chaining iterator that can be applied recursively to eventually implement nested for generator expressions as in #4867.

@kmsquire
Copy link
Member

In Iterators.jl, this is called chain.

@JeffBezanson
Copy link
Sponsor Member Author

I know, but this one is a bit different since it takes a single iterator and flattens its output, which allows nesting.

@JeffBezanson
Copy link
Sponsor Member Author

In my view, concat would take multiple arguments and combine them. Something that converts the structure [[1,2],[3,4]] to [1,2,3,4] is flattening.

@StefanKarpinski
Copy link
Sponsor Member

Fair enough.

@kmsquire
Copy link
Member

Sure, ok

@stevengj
Copy link
Member

stevengj commented Feb 4, 2016

Presumably, this will be exported somewhere? Needs a NEWS item in that case.

@Keno
Copy link
Member

Keno commented Feb 4, 2016

As a point of reference, there is also some discussion here, which I think implements the same thing JuliaCollections/Iterators.jl#50

@JeffBezanson
Copy link
Sponsor Member Author

I see --- it looks like my implementation avoids state, at the cost of some repeated done calls? I suspect done is the most likely of the iterator functions to be pure, so this seems acceptable. Tasks are a notable exception (see #8149). Maybe we should switch the implementation to #6125 (comment), barring a better solution to the general problem.

@mschauer
Copy link
Contributor

That is a good orthogonal design: this is a flatten-iterator which does not count its elements of the child iterators and has no length and works with iterators which can be read only once.
The other design can be a concat-iterator which returns a IteratorND with element-Arrays/Iterators/IteratorsND concatenated along the fasted dimension.

s = start(f.it)
d = done(f.it, s)
# this is a simple way to make this function type stable
d && error("argument to Flatten must contain at least one iterator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more specific exception type? ArgumentError maybe?

@samoconnor
Copy link
Contributor

I'd like to use this in #15409.

I have a function batch that splits a collection into batches to be processed by nworkers.

(batch_size should be << length(collection) / nworkers to allow for non-uniform workloads.)

"""
    batch(c; nworkers=nworkers(), max_batch_size=100)

Split a collection into batches for processing by `n_workers`.
Equivalent to `split(c, max_batch_size)` when `length(c) >> max_batch_size`.
"""
function batch(c; n_workers=nworkers(), max_batch_size=100)

    # Split collection into batches, then peek at the first few batches...
    batches = split(c, max_batch_size)
    n = Int(n_workers * 4)
    head, tail = take_head(batches, n)

    # If there are not enough batches, use a smaller batch size...
    if length(head) < n
        head = vcat(head...)
        batch_size = max(1, round(Int, length(head) / n))
        return split(head, batch_size)
    end

    return flatten((head, tail))
end

See also take_head implementation below ...
(take_head(c,n) is like take(c,n), drop(c,n) but without iterating over the head twice)

type ResumeIterator{T}
    itr::T
    state
end

"""
    resume(itr, state) -> iterator

Returns an iterator that iterates over `itr` starting from `state`.
"""
resume(itr, state) = ResumeIterator(itr, state)

eltype{T}(::Type{ResumeIterator{T}}) = eltype(T)
start(itr::ResumeIterator) = itr.state
next(itr::ResumeIterator, state) = next(itr.itr, state)
done(itr::ResumeIterator, state) = done(itr.itr, state)


"""
    take_head(c, n) -> head, tail

Returns `head`: the first `n` elements of `c`;
and `tail`: an iterator over the remaining elements.
"""
function take_head(c, n)
    head = Vector{eltype(c)}(n)
    s = start(c)
    i = 0
    while i < n && !done(c, s)
        i += 1
        head[i], s = next(c, s)
    end
    return resize!(head, i), resume(c, s)
end

@JeffBezanson
Copy link
Sponsor Member Author

Ok, in that case I'll plan to merge this soon.

We already have ResumeIterator in Base, called Rest.

@samoconnor
Copy link
Contributor

Thanks Jeff.

@samoconnor
Copy link
Contributor

Any thoughts on what take_head above should be called?

The existing take could be modified to return a (head, tail) tuple (it is only used in one place in base).

@mschauer
Copy link
Contributor

@samoconnor take returns an iterator, it does not peek at the iterates. Splitting an iterator into (head, tail) where tail is an iterator and head is a container (iterable, but really an container of elements) can be better thought of as an operation of the collect familily, maybe a partial_collect or collect_head or something the like.

@samoconnor
Copy link
Contributor

@mschauer what I've ended up with in #15409 id head_and_tail: 87a0dad#diff-83c91f45204e96e58535f5491cca8cd8R133 (not exported at this point).

What about just collectn(c, n) ?

@vchuravy
Copy link
Sponsor Member

vchuravy commented Apr 3, 2016

Was it intentional that flatten is not exported?

@JeffBezanson
Copy link
Sponsor Member Author

Yes, because we should really wrap the iterators in a module to separate them from eager functions.

@stevengj stevengj mentioned this pull request Apr 8, 2016
@mschauer mschauer mentioned this pull request Jun 1, 2016
4 tasks
@Keno
Copy link
Member

Keno commented Nov 22, 2016

I still don't like the Flatten name FWIW

@StefanKarpinski
Copy link
Sponsor Member

Got anything better to call it?

@Keno
Copy link
Member

Keno commented Nov 22, 2016

I still like either Chain or Concat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants