Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change for i=1:length(A) to for i in eachindex(A) #10858

Merged
merged 2 commits into from
Apr 20, 2015
Merged

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Apr 17, 2015

PSA: when writing algorithms with potentially-multidimensional AbstractArrays, please write for i in eachindex(A) rather than for i = 1:length(A) whenever you don't have some reason to insist that i be an integer.

This fixes performance problems for many operations (see #1168 (comment)).

There are a few outstanding issues, some of which arise when trying to use one index for two different arrays in circumstances in which we have not previously required the shapes to match:

and others of which pertain to returning an index to the user (should we return CartesianIndexes?)

and finally others where we may want to change the algorithm:

@pao
Copy link
Member

pao commented Apr 17, 2015

This sounds like it should be a mailing list post, and an addition to the MATLAB differences list (not sure what other languages this is relevant for)? Should use of eachindex() in user code targeting v0.4 be considered the default?

@lindahua
Copy link
Contributor

+1

A benchmark that shows the speed improvement + documentation of underlying functions (e.g. linear indexing) would be great!

@timholy
Copy link
Sponsor Member Author

timholy commented Apr 17, 2015

eachindex is old news, and you can find benchmarks in various issues (#8432, #8501, #9329, #10507). This PR was just tackling the overdue issue of making more algorithms use it.

@lindahua, to catch you up: eachindex does linear indexing when it is fast (if the trait linearindexing(A) == LinearFast()), and cartesian indexing when not. So you can get the best of both worlds that way.

Yes, I agree that more needs to be done to advertise this.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 17, 2015

Documentation in #10859.

@pao
Copy link
Member

pao commented Apr 17, 2015

eachindex is old news

Yes, I agree that more needs to be done to advertise this.

The team as a whole spends more time being awesome than I have time to keep up with the awesomeness, and I suspect I'm trying harder than the average Julia user. Thanks for highlighting the relevant issues for me.

@IainNZ
Copy link
Member

IainNZ commented Apr 17, 2015

Documentation commit: 9b4f212

I also didn't know this was a function for Mortal Men to use in their code, assumed it was just for the Elvish-kings under the Base.sky and Dwarf-lords in their Base.halls of Base.stone.

@milktrader
Copy link
Contributor

Thanks for the PSA!

@pao
Copy link
Member

pao commented Apr 17, 2015

I also didn't know this was a function for Mortal Men to use in their code, assumed it was just for the Elvish-kings under the Base.sky and Dwarf-lords in their Base.halls of Base.stone.

You're from New Zealand. Doesn't that count? (Also, it's morning here, and you've added substantially to my enjoyment of it. Thank you.)

Would it make sense for something along the lines of "you probably want to use this everywhere you're currently using the range 1:length(A) for iteration over an array" to be made explicit in that documentation? Otherwise it's not obvious that you'd want to use it.

@timholy
Copy link
Sponsor Member Author

timholy commented Apr 17, 2015

It seems we should probably add something to the manual chapter on arrays.

@timholy
Copy link
Sponsor Member Author

timholy commented Apr 17, 2015

This already needs rebasing. While I'm at it, what do people think about some of the TODO items above?
For the three categories:

  • Haven't previously been insisting on matching sizes: we could try adding those constraints and see what breaks, unless there are objections
  • Returning an index to the user: I think the big decision is whether we should have a sub2ind that works on CartesianIndex (and hence return an integer), or if we should return an array of CartesianIndexes.
  • Changing the algorithm: that can be done next time someone complains about their performance, I don't see any reason to tackle that now 😄

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 17, 2015

Man, those are all tough decisions. The choice to make eachindex return either an integer or a CartesianIndex is a step towards putting Cartesian indices into user's hands more often (whether they realize it or not). But in that case, it seems more obvious that the returned element is specifically for indexing. There are lots of use-cases for find… and I can only see that causing trouble. Especially since it's generally an anti-pattern to index directly from the results of find; you should just be using the logical index instead.

Logical indexing feels like we're still doing it wrong. But I'm not sure how to make it better. See also #10065 (comment).

But I agree on punting the algorithm changes to another day, perhaps with a comment in the code about how it could be written better.

@timholy
Copy link
Sponsor Member Author

timholy commented Apr 17, 2015

I agree that changing find semantics seems to be asking for trouble. One option would be find{T}(::Type{T}, A) and have T default to Int; people who want the cartesian version could use find(CartesianIndex, A).

I can rebase this and merge it, and leave hard decisions for another day 😄.

@kmsquire
Copy link
Member

One option would be find{T}(::Type{T}, A) and have T default to Int; people who want the cartesian version could use find(CartesianIndex, A).

+1 -- this is really nice... although I'd love to have a shorter name for CartesianIndex.

This fixes performance problems for many SubArray operations
Since `done` is called on each iteration but `start` is called
only once, it makes more sense to put this logic in `start`.
@timholy
Copy link
Sponsor Member Author

timholy commented Apr 19, 2015

I realized that my proposal for find(CartesianIndex, A) is basically what findn does, except the latter returns a tuple of Vector{Int} indexes. So we should probably have one or the other but not both.

timholy added a commit that referenced this pull request Apr 20, 2015
Change `for i=1:length(A)` to `for i in eachindex(A)`
@timholy timholy merged commit 3dbc828 into master Apr 20, 2015
@timholy timholy deleted the teh/eachindex branch April 20, 2015 00:16
@timholy
Copy link
Sponsor Member Author

timholy commented Apr 20, 2015

As requested by @pao, mailing-list post is here.

@tkelman
Copy link
Contributor

tkelman commented Apr 20, 2015

That bitarray segfault on appveyor is worrying... https://ci.appveyor.com/project/StefanKarpinski/julia/build/1.0.3946/job/ius9kwerpcskfbi8

More recent builds have passed though, so maybe it's just a case of #9176?

@timholy
Copy link
Sponsor Member Author

timholy commented Apr 20, 2015

The earlier version of this passed on all 3 platforms, and all I did was fix a tiny merge conflict that had nothing to do with BitArrays. So I wasn't worried about this being the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet