Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Do not consider iterators as scalars in broadcast #25356

Closed
wants to merge 1 commit into from

Conversation

nalimilan
Copy link
Member

Consider that all types implementing start are collections, and throw an error for SizeUnknown and IsInfinite iterators. This makes broadcast fail by default for most iterators, since the current fallback functions assume that collections support indexing. Custom iterators could implement their own methods, but the default ones should probably be improved to collect iterators without requring indexing.

Possible fix for #18618.

This change is not terribly appealing as-is as it does not add any new feature, it just throws an error when calling broadcast on iterators, where they were previously treated as scalar. Treating iterators as scalars isn't very useful, as can be seen from the fast that only one test relied on this. In the perspective of the feature freeze, throwing errors is good since it will allow supporting broadcast on iterators without breaking existing code. But I'm not yet sure what the fallback implementation can look like: it would probably have to process one element at a time, but if multiple iterators are passed one of them would have to be collected temporarily somewhere (since repeated indexing is not possible in general).

Consider that all types implementing start() are collections,
and throw an error for SizeUnknown and IsInfinite iterators.
This makes broadcast() fail by default for most iterators, since
the current fallback functions assume that collections support indexing.
Custom iterators could implement their own methods, but the default ones
should probably be improved to collect iterators without requring indexing.
Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't notice the original discussion. I think that with a small change this may be workable. Nanosoldier will be critical.

Needs tests, though.

BroadcastStyle(::Type) = Scalar()
hasshape_ndims(::Base.HasShape{N}) where {N} = N
function BroadcastStyle(::Type{T}) where T
if method_exists(start, Tuple{T})
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not inferrable, and thus might be pretty bad for code like

function foo(x)
    y = Float64.(x)
    # now do something with y
end

Unfortunately #16422 wouldn't help.

At the same time, I recognize that any problematic type can be optimized by adding a specific defintion.

At a minimum we may have to change the typeof calls in collect_styles to Core.Typeof, and then specialize

BroadcastStyle(::Type{Type{T}}) where T = Scalar()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could require iterators to define a trait. That would be more consistent with what we do elsewhere, and that wouldn't be a terrible burden either. I had contemplated adding an NotIterable type to Base.iteratorsize, but that could also be a separate function like isiterable.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On balance I think it's better to require iterators to define a trait, and make scalars the default.

One slightly-crazy thought is that the trait name could be the output of BroadcastStyle. But on balance I'm not sure this is a good idea, because there may be reasons to have things that act like scalars that don't return Scalar(). It's probably better to have a separate isiterable trait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash mentioned here that method_exists could be made inferable now, presumably circumnavigating the problem mentioned in #16422.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but it hasn't been in the past since we don't want the coupling. Also, #25261 will break this.

@@ -53,7 +53,7 @@ end
abstract type IteratorSize end
struct SizeUnknown <: IteratorSize end
struct HasLength <: IteratorSize end
struct HasShape <: IteratorSize end
struct HasShape{N} <: IteratorSize end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -57,3 +57,20 @@ struct RangeStepRegular <: TypeRangeStep end # range with regular step
struct RangeStepIrregular <: TypeRangeStep end # range with rounding error

TypeRangeStep(instance) = TypeRangeStep(typeof(instance))

## iterable trait
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code isn't used in the PR currently, but it illustrates the alternative approach based on a trait rather than o method_exists(start, Tuple{T}). It fixes the type inference issue (provided the fallback doesn't call method_exists as it currently does).

BTW, I've noted an inconsistency in the naming of traits: we have iteratorsize, iteratoreltype, but IndexStyle, TypeRangeStep, TypeArithmetic and TypeOrder. Looks like the CamelCase variants are more numerous and more recent, so maybe we should adopt that convention everywhere? Added to #20402.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uppercase makes the most sense when what will be returned is a type-instance---for example, IndexStyle(a) will return T() where T<:IndexStyle. With better constant-prop it's less obvious that we need to return a dedicated type-instance, although that does have some advantage in clarity.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎 Adding a new thing every iterable type needs to define is not ideal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not "ideal", but not the end of the world either IMHO given that you need to define several methods anyway. And if we really don't want to add another trait, we can add a new type to iteratorsize (and rename it), which will have the advantage of making the choice of the type more explicit.

Anyway I'm all ears if somebody has a better solution that works (i.e. is inferrable, see @timholy's comment above).

prod_iteratorsize(::HasLength, ::HasLength) = HasShape{2}()
prod_iteratorsize(::HasLength, ::HasShape{N}) where {N} = HasShape{N+1}()
prod_iteratorsize(::HasShape{N}, ::HasLength) where {N} = HasShape{N+1}()
prod_iteratorsize(::HasShape{M}, ::HasShape{N}) where {M,N} = HasShape{M+N}()
Copy link
Sponsor Member

@JeffBezanson JeffBezanson Jan 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these will be inferrable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@code_warntype seems to be happy with a similar example:

f(::AbstractArray{S,M}, ::AbstractArray{T,N}) where {S,T,M,N} = Array{M+N}()
@code_warntype f([1], [1 2])

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right.

@mbauman
Copy link
Sponsor Member

mbauman commented Mar 22, 2018

Superseded by #26435.

@mbauman mbauman closed this Mar 22, 2018
@mbauman mbauman added the domain:broadcast Applying a function over a collection label Apr 24, 2018
@nalimilan nalimilan deleted the nl/iterable branch December 1, 2018 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:broadcast Applying a function over a collection
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants