Support for 0-indexed and arbitrary-indexed arrays #16260

timholy · 2016-05-08T12:34:56Z

This is the one part of ArrayIteration.jl that I think it's reasonable to move to Base for the julia-0.5 release. The core change is the addition of the ~~inds~~indices function, which should be thought of as a sister function to size but instead returns a tuple containing the in-bounds indexes of an array. The fallback is

indices(A::AbstractArray, d) = 1:size(A,d)
indices{T,N}(A::AbstractArray{T,N}) = ntuple(d->inds(A, d), Val{N})

But this allows one to override indices for specific types:

A = OffsetArray([1 3; 2 4], (-1,2))
@test A[0,3] == 1
@test A[1,3] == 2
@test A[0,4] == 3
@test A[1,4] == 4
julia> A[1,1]
ERROR: BoundsError: attempt to access OAs.OffsetArray{Int64,2,Array{Int64,2}} with inds (0:1,3:4):
 #undef  #undef
 #undef  #undef
  at index [1,1]
 in throw_boundserror(::OAs.OffsetArray{Int64,2,Array{Int64,2}}, ::Tuple{Int64,Int64}) at ./abstractarray.jl:134
 [inlined code] from ./abstractarray.jl:103
 in getindex(::OAs.OffsetArray{Int64,2,Array{Int64,2}}, ::Int64, ::Int64) at /home/tim/src/julia-0.5/test/abstractarray.jl:542
 in eval(::Module, ::Any) at ./boot.jl:230

This adds the definition and uses it in checkbounds. The bigger job will be wiring it into the rest of Base. So I thought I'd post at this stage and see what folks think.

carnaval · 2016-05-08T13:02:19Z

👎 we have to leave something for the hackernews threads to talk about :-)

tkelman · 2016-05-08T13:07:14Z

base/abstractarray.jl

-checkbounds(::Type{Bool}, sz::Integer, i::Real) = 1 <= i <= sz
-checkbounds(::Type{Bool}, sz::Integer, ::Colon) = true
-function checkbounds(::Type{Bool}, sz::Integer, r::Range)
+checkbounds(::Type{Bool}, inds::UnitRange, i) = throw(ArgumentError("unable to check bounds for indices of type $(typeof(i))"))


not a fan of having an argument with the same name as a closely-related function

kmsquire · 2016-05-08T13:39:12Z

👍 from me for the idea.

I'm impressed that it takes so little code to implement this!

Have you measured the performance impact?

timholy · 2016-05-08T13:40:28Z

we have to leave something for the hackernews threads to talk about :-)

Yeah, that was my biggest concern too 😄.

More seriously, are you genuinely down on this? There are cases where it's nicer to write an algorithm where indexing is not based on 1. For example, if I compute a quantity as a function of displacement, it's much prettier to write my code in terms of quantity[Δx] even if Δx is negative.

timholy · 2016-05-08T13:52:04Z

Have you measured the performance impact?

Not yet, but AFAICT there shouldn't be any to the addition of inds. Of course checkbounds is enormously performance-sensitive, so there's every chance that I've messed something up in this implementation. I thought I'd first see what folks think.

Actually using an offset seems likely to have a very small performance impact, but it should be essentially negligible compared to cache misses.

vtjnash · 2016-05-08T14:41:23Z

I would prefer spelling out the name, it's not that much saved to abbreviate it. That said, is this actually distinct from "keys"?

timholy · 2016-05-08T14:48:52Z

A secret reason I made it inds was to sidestep the indexes/indices debate. That kind of cowardice may not be healthy in the long run.

I also thought about keys. We could define keys(A, d), but for consistency with Dict, keys(A) should probably return an iterator (e.g., a CartesianRange) rather than an NTuple{N,UnitRange{Int}}. The default iterator for Tuple is to just iterate over the elements, so we can't just replace CartesianRange with NTuple{N,UnitRange{Int}}.

mlubin · 2016-05-08T18:01:39Z

Having this infrastructure in Base will certainly smooth over some issues with JuMP containers. CC @joehuchette @IainNZ

eschnett · 2016-05-08T18:12:33Z

In https://github.com/eschnett/FlexibleArrays.jl , I used lbnd and ubnd to obtain the lower and upper array bounds, respectively. I like having a single function that returns both. This also allows strided arrays if one returns a StepRange. It even generalizes to sparse arrays, where inds can return a list (or iterator) of the indices.

timholy · 2016-05-08T18:52:58Z

@eschnett, thanks for the enthusiasm. However, I don't think this can return sparse ind***s:

it's valid to address a sparse vector or matrix at non-stored entries, and so you need a way of indicating whether you want all or just the stored ones;
for a sparse matrix, the stored entries vary by column, so this API would be insufficient;
part 2 of this PR is to change our dimension checks to inds(A, 1) == inds(B, 1) and for that reason they have to return the full range.

Over at ArrayIteration I use stored to indicate that you want to access just the stored elements of an array, so the change you want is definitely on its way.

eschnett · 2016-05-08T19:26:22Z

@timholy My main concern is about arrays with a lower bound different from 1. A secondary concern would be for arrays with a non-unit stride. (This should work find with an inds function; whether this support should go into Base is a different question.)

Irregular sparse arrays are not relevant for me at the moment. However, block-sparse arrays are, where the upper bound on some dimension j depends on the value of an earlier dimension i. I'm just bringing this up to raise general awareness; there are many open questions regarding how these should be handled, or efficiently stored, or traversed, or how much of this needs to be exposed to the user anyway since efficient algorithms will depend on (or require) such a sparsity structure.

StefanKarpinski · 2016-05-09T10:26:41Z

I like the idea of being able to write Array(T, -a:a, 0:b, -c:0) and get an array with "strange" indices.

timholy · 2016-05-09T11:12:09Z

Without an enormous amount of surgery it can't be Array(T, inds...), but it could be OffsetArray(T, inds...).

StefanKarpinski · 2016-05-09T12:56:09Z

Yes, of course, that would be epic surgery. Just saying that it would be a nice Fortran feature graft.

timholy · 2016-05-09T15:50:07Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-05-09T18:24:24Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

Edit by jrevels: Ignore the pending status - like this comment says, the job is actually complete. One of the nodes isn't sending out final statuses for some reason, looking into this now.

timholy · 2016-05-11T11:21:07Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2016-05-11T13:55:31Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

jrevels · 2016-05-11T15:12:47Z

See here - I just realized ":master" ended up pointing to your fork's master by default instead of JuliaLang/julia:master. Just fixed this, sorry for the bug. Let's try it now:

@nanosoldier runbenchmarks(ALL, vs = ":master")

Edit: Oh, this PR isn't from a fork, so it should've been fine in the first place. Nevermind then, sorry for the noise.

timholy · 2016-05-11T16:02:57Z

I suspect there's still one corner case to go, anyway. I assumed this would get them all, but I should check more thoroughly locally.

nanosoldier · 2016-05-11T17:47:02Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

timholy · 2016-05-13T03:20:20Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

timholy · 2016-05-13T03:23:04Z

@mbauman, this contains yet a different implementation of bounds-checking. Now that the valid range of indices is represented by a UnitRange rather than an Integer, checkbounds(Bool, inds, i) has the unfortunate possibility of being confused for checkbounds(Bool, A, i). So I renamed it checkindex. See what you think.

timholy · 2016-06-14T05:48:06Z

As you review it, remember JuliaLang/www_old.julialang.org#386 (might make it easier to understand if you read that first).

StefanKarpinski · 2016-06-14T14:10:42Z

I was getting worried about making too many "status update" messages

I wouldn't worry about this, @timholy – your SNR is excellent, I wouldn't mind more status updates.

JeffBezanson · 2016-06-14T14:38:49Z

base/exports.jl

@@ -484,13 +484,15 @@ export
 zeta,

 # arrays
+ allocate_for,


:( Maybe this can be combined with similar? If not, at least needs a better name.

Yes, I guess it could simply be another "branch" of similar. Will fix.

cleanup some old code

Due to a change in the behaviour of `mapslices` (#16260), `median(X,k)` would mutate the underlying array. Fixes #17153.

andreasnoack · 2016-07-05T13:25:37Z

@timholy The new meaning of Colon broke DistributedArrays.jl here and here so I'm trying to figure out how to think about new Colon.

You have added a deprecation for first so although it's a little weird to require the unexported_first function instead of the exported first it works. However, I need to figure out how to handle intersect(Colon,UnitRange). Do you think that it would it be safe to define

intersect(a::Colon,b::UnitRange) = b

?

timholy · 2016-07-05T14:36:11Z

I do think that's a reasonable definition to have.

With regards to _first, I was wondering if/when this would come up, so I'm very glad you brought this up. I'm sure you understand the issue: without having the array as one of the arguments, there's no longer any way to evaluate first(:) meaningfully. (In retrospect, maybe we never should have defined this method at all, since last(:) never made sense.) The current strategy is to force folks to supply both the array and the dimension. Without discussion, I was nervous about calling it first(A, d, :), so I made it a new non-exported function. But if anyone has better ideas, I'm all ears. I wonder if this should actually be a method of indices, i.e.,

indices(A, d, inds) = inds
indices(A, d, ::Colon) = indices(A, d)

Thoughts?

timholy · 2016-07-05T14:56:31Z

I guess one problem with that definition is that we have size(A, 2, 4), which returns a Tuple{Int,Int}; it would make sense to keep indices parallel. Given your definition above, it could be intersect(indices(A,d), inds), and that would be fast for Colon, but not necessarily fast for other choices.

andreasnoack · 2016-07-06T03:00:27Z

Right now, I can barely figure out how to fix setindex!(Array, SubDArray, indx). Too many indices. Source, destination, global, and local. Figuring out how to handle a case with custom index ranges on top of this is extreme sport for the brain but probably something that suits your taste. If I acquire any insights during the process I'll share them here.

timholy · 2016-07-06T10:53:55Z

I feel your pain: index gymnastics are never fun, though it's better in Julia since we've developed composable elementary operations. I find that the main trick is "look no deeper than you absolutely have to," although perhaps with distributed arrays there's no escaping looking deep. Sounds like a case where developing easy-to-think-about elementary operations might make life easier.

But this particular issue is "just" an API question. One thing to note is that this would become trivial with something like #15750: we'd have first(::Colon) = ifirst (or whatever we'd decide to call it). As it is, I've scoured through our array functions and can't find anything relevant. So my next question is this: what do people use first(:) for? AFAICT, it's really to compute an "offset". So perhaps we should define

indexoffset(indx) = first(indx)-1
indexoffset(::Colon) = 0

Even more generally (but less simply), we could have

reindex(indx, refindx) = indx
reindex(::Colon, refindx) = refindx

so what used to be first(:) would become first(reindex(:, indices(A,d)).

Due to a change in the behaviour of `mapslices` (JuliaLang#16260), `median(X,k)` would mutate the underlying array. Fixes JuliaLang#17153.

tkelman added the needs docs Documentation for this change is required label May 8, 2016

tkelman reviewed May 8, 2016
View reviewed changes

timholy mentioned this pull request May 8, 2016

indices or indexes? #12902

Closed

nalimilan mentioned this pull request May 8, 2016

RFC: Remove find type assertion to allow other iterables #16110

Merged

timholy force-pushed the teh/inds branch from 44bb49c to bb67ee7 Compare May 9, 2016 13:17

timholy removed the needs docs Documentation for this change is required label May 9, 2016

timholy force-pushed the teh/inds branch from bb67ee7 to 7f3981f Compare May 11, 2016 11:15

timholy force-pushed the teh/inds branch from fe47a00 to 0d88892 Compare May 13, 2016 03:18

timholy mentioned this pull request Jun 14, 2016

New implementation based on julia-0.5 infrastructure JuliaArrays/OffsetArrays.jl#2

Merged

timholy mentioned this pull request Jun 14, 2016

Histogram Equalisation JuliaImages/Images.jl#458

Merged

JeffBezanson reviewed Jun 14, 2016
View reviewed changes

timholy mentioned this pull request Jun 14, 2016

misc. benchmark regressions since 0.4 #16128

Closed

andreasnoack mentioned this pull request Jun 15, 2016

DataArrays broken on master JuliaStats/DataArrays.jl#200

Closed

timholy referenced this pull request Jun 15, 2016

Merge pull request #16934 from JuliaLang/jn/old-cleanup

20d376e

cleanup some old code

andreasnoack mentioned this pull request Jun 15, 2016

Tag a new release? JuliaStats/GLM.jl#144

Closed

This was referenced Jun 16, 2016

Safe non-traditional array indexing #16973

Closed

scan! and scan functions #14730

Closed

This was referenced Jun 26, 2016

Tuple-dialect has performance consequences (anonymous functions?) #17126

Closed

similar should accept a type as first parameter #17124

Closed

Refactor API for unconventionally-indexed arrays #17137

Merged

simonster mentioned this pull request Jun 27, 2016

median changes original array #17153

Closed

simonbyrne added a commit that referenced this pull request Jun 27, 2016

Make median non-mutating on arrays.

aaaf8fa

Due to a change in the behaviour of `mapslices` (#16260), `median(X,k)` would mutate the underlying array. Fixes #17153.

simonbyrne mentioned this pull request Jun 27, 2016

Make median non-mutating on arrays. #17154

Merged

timholy mentioned this pull request Jul 1, 2016

Make unvetted size throw an error for arrays with non-1 indexing #17228

Merged

This was referenced Jul 8, 2016

Display broken for SimpleVector (svec) #17338

Closed

Fix broadcast_shape when Base.OneTo is defined JuliaLang/Compat.jl#250

Closed

mlubin mentioned this pull request Aug 7, 2016

Use staged functions in @variable jump-dev/JuMP.jl#346

Closed

mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016

Make median non-mutating on arrays.

bc733d8

Due to a change in the behaviour of `mapslices` (JuliaLang#16260), `median(X,k)` would mutate the underlying array. Fixes JuliaLang#17153.

stevengj mentioned this pull request Aug 31, 2020

Zero-dimensional views of AbstractVectors assume that the parent is 1-indexed #37274

Closed

timholy mentioned this pull request Nov 3, 2020

Support matrix multiplication (Continue #93) JuliaArrays/OffsetArrays.jl#146

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for 0-indexed and arbitrary-indexed arrays #16260

Support for 0-indexed and arbitrary-indexed arrays #16260

timholy commented May 8, 2016 •

edited

Loading

carnaval commented May 8, 2016

tkelman May 8, 2016

kmsquire commented May 8, 2016

timholy commented May 8, 2016 •

edited

Loading

timholy commented May 8, 2016

vtjnash commented May 8, 2016

timholy commented May 8, 2016 •

edited

Loading

mlubin commented May 8, 2016

eschnett commented May 8, 2016

timholy commented May 8, 2016

eschnett commented May 8, 2016

StefanKarpinski commented May 9, 2016

timholy commented May 9, 2016

StefanKarpinski commented May 9, 2016

timholy commented May 9, 2016

nanosoldier commented May 9, 2016 •

edited by jrevels

Loading

timholy commented May 11, 2016

nanosoldier commented May 11, 2016

jrevels commented May 11, 2016 •

edited

Loading

timholy commented May 11, 2016

nanosoldier commented May 11, 2016

timholy commented May 13, 2016

timholy commented May 13, 2016 •

edited

Loading

timholy commented Jun 14, 2016

StefanKarpinski commented Jun 14, 2016

JeffBezanson Jun 14, 2016

timholy Jun 14, 2016

andreasnoack commented Jul 5, 2016

timholy commented Jul 5, 2016

timholy commented Jul 5, 2016

andreasnoack commented Jul 6, 2016

timholy commented Jul 6, 2016

Support for 0-indexed and arbitrary-indexed arrays #16260

Support for 0-indexed and arbitrary-indexed arrays #16260

Conversation

timholy commented May 8, 2016 • edited Loading

carnaval commented May 8, 2016

tkelman May 8, 2016

Choose a reason for hiding this comment

kmsquire commented May 8, 2016

timholy commented May 8, 2016 • edited Loading

timholy commented May 8, 2016

vtjnash commented May 8, 2016

timholy commented May 8, 2016 • edited Loading

mlubin commented May 8, 2016

eschnett commented May 8, 2016

timholy commented May 8, 2016

eschnett commented May 8, 2016

StefanKarpinski commented May 9, 2016

timholy commented May 9, 2016

StefanKarpinski commented May 9, 2016

timholy commented May 9, 2016

nanosoldier commented May 9, 2016 • edited by jrevels Loading

timholy commented May 11, 2016

nanosoldier commented May 11, 2016

jrevels commented May 11, 2016 • edited Loading

timholy commented May 11, 2016

nanosoldier commented May 11, 2016

timholy commented May 13, 2016

timholy commented May 13, 2016 • edited Loading

timholy commented Jun 14, 2016

StefanKarpinski commented Jun 14, 2016

JeffBezanson Jun 14, 2016

Choose a reason for hiding this comment

timholy Jun 14, 2016

Choose a reason for hiding this comment

andreasnoack commented Jul 5, 2016

timholy commented Jul 5, 2016

timholy commented Jul 5, 2016

andreasnoack commented Jul 6, 2016

timholy commented Jul 6, 2016

timholy commented May 8, 2016 •

edited

Loading

timholy commented May 8, 2016 •

edited

Loading

timholy commented May 8, 2016 •

edited

Loading

nanosoldier commented May 9, 2016 •

edited by jrevels

Loading

jrevels commented May 11, 2016 •

edited

Loading

timholy commented May 13, 2016 •

edited

Loading