Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not truncate zip inputs #20499

Closed
cstjean opened this issue Feb 7, 2017 · 19 comments
Closed

Do not truncate zip inputs #20499

cstjean opened this issue Feb 7, 2017 · 19 comments
Labels
domain:collections Data structures holding multiple items, e.g. sets needs decision A decision on this change is needed

Comments

@cstjean
Copy link
Contributor

cstjean commented Feb 7, 2017

Like Python, Julia's zip(a, b) ignores the tail end of its input sequences when they are of unequal length (but see #17928). I've yet to encounter a problem where I wanted that behavior, and I would much prefer seeing an error when the inputs are of unequal length. Are there applications where truncation is very convenient?

@StefanKarpinski
Copy link
Sponsor Member

I believe that truncation is generally useful for combining infinite iterators with finite ones. I do agree that if two finite iterators have different lengths, an error is better.

@ararslan ararslan added the domain:collections Data structures holding multiple items, e.g. sets label Feb 7, 2017
@JeffBezanson
Copy link
Sponsor Member

Duplicate of #17928?

@StefanKarpinski
Copy link
Sponsor Member

I would close that one in favor of this since that was a specific confusing case, whereas this is explicitly a decision issue about whether the underlying general behavior is a good idea or not.

@stevengj
Copy link
Member

stevengj commented Apr 11, 2017

A similar issue arises from Generator(f, iters...), which on discourse was noted to cause silent truncation for map(+, (1,2), 3).

My gut feeling is that if all the iterators have lengths (iteratorsize(itr)::Union{HasLength,HasShape}), then a length mismatch should be an exception in both zip and Generator.

@stevengj stevengj added the needs decision A decision on this change is needed label Apr 11, 2017
@StefanKarpinski
Copy link
Sponsor Member

I think that we should be even stricter and only allow truncation of infinite streams. Or does !haslength mean that a stream is infinite? I would interpret it as having unknown length, but I may be misunderstanding the trait.

@JeffBezanson
Copy link
Sponsor Member

No, you're right, !haslength just means the length is unknown. But what if you want to do something like zip the line iterators of two files, deciding whether to stop as you go? That should be allowed.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Apr 13, 2017

Yes, fair enough, although it might be better to opt into that sort of thing somehow.

@cstjean
Copy link
Contributor Author

cstjean commented Apr 15, 2017

I feel like providing both zip and truncating_zip is a cleaner, simpler solution than special-casing zip's behaviour.

@ararslan
Copy link
Member

It might be nice to provide a keyword argument to zip that dictates whether to truncate to a common length. For example, zip(1:10, 1:3, trunc=true) would iterate over (1,1), (2,2), (3,3) and zip(1:10, 1:3, trunc=false) would throw an error because the inputs are not the same length.

Since we allow isbits values for type parameters, perhaps we could even parameterize Zip based on whether it should truncate.

Anyway, just an idea.

@StefanKarpinski
Copy link
Sponsor Member

I would spell that keyword out fully, otherwise it kind of looks like you want to apply the trunc function to every value (which makes no sense).=, i.e. zip(1:10, 1:3, truncate=true). Some gut feeling tells me that it would be better to have a function which does the truncation for you, but then that function ends up just being a truncating zip, at which point having a keyword seems better.

@nalimilan
Copy link
Member

I think we should fix this for 0.7. See #25583 for a problematic illustration.

@JeffBezanson
Copy link
Sponsor Member

I believe this is only a problem with reverse. If reverse is the problematic case, it should throw the error instead of punishing all users of zip (such as myself).

@stevengj
Copy link
Member

My feeling is that both zip and reverse should have checks.

zip should throw an error for zipping HasLength iterators of unequal lengths, but should allow infinite or unknown lengths to truncate.

reverse should also throw an error for zip of infinite or unknown lengths.

@JeffBezanson
Copy link
Sponsor Member

I continue to find it hard to see why truncating zip is a problem. Maybe part of the reason is that iterators are inherently lazy; by design you can take as much or as little from an iterator as you want. So it seems weird for it to be an error not to consume all of an iterator.

This thread is very short on real, compelling arguments. I see an "I would much prefer" and a "gut feeling", and not much more than that. So I remain opposed to this change.

@cstjean
Copy link
Contributor Author

cstjean commented Jan 18, 2018

The rationale is that if zip is nearly always used on same-length arguments, then throwing an error on unequal-length arguments might catch several logic errors.

hcat([1,2], [3,4,5]) throws a DimensionError, which is IMHO much nicer than silent truncation would be.

@JeffBezanson
Copy link
Sponsor Member

In my view, that's because hcat explicitly operates on the shapes of arguments, so they have to match in a certain way. But iterators just pull a sequence of values.

@JeffBezanson JeffBezanson removed the status:triage This should be discussed on a triage call label Jan 18, 2018
@StefanKarpinski
Copy link
Sponsor Member

Triage: we're already getting late in the game here and there are significant design and feasibility questions about how to even make this require same length without breaking things. Python and Lisp have the same truncating behavior as we have now, so this seems like it can't be that bad.

@sirex
Copy link

sirex commented Nov 3, 2018

If it was decided to truncate zip, then why in some cases it gives error?

julia> collect(zip(1:3, 2:5))
ERROR: DimensionMismatch("dimensions must match")

julia> [i for i in zip(1:3, 2:5)]
ERROR: DimensionMismatch("dimensions must match")

julia> for i in zip(1:3, 2:5)
           @show i
       end
i = (1, 2)
i = (2, 3)
i = (3, 4)

julia> length(zip(1:3, 2:5))
3

Same issue was raised here #17928, but was closed as duplicate, which it isn't.

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jul 11, 2021

Python and Lisp have the same truncating behavior as we have now, so this seems like it can't be that bad.

FYI: Python considers it a mistake a common bug, while "There is nothing "wrong" with the default behavior of zip" (and will add optional strict):
https://www.python.org/dev/peps/pep-0618/

These bugs are not only difficult to diagnose, but difficult to even detect at all.

I saw in other issue: "Triage decided not to throw an error when zip is passed arguments with different lengths", I guess referring to Jeff removing triage, and Stefan closing. It's not too late to do as Python, optional, but is too late to error different lengths?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

8 participants