Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Rename indices to axes #25057

Merged
merged 1 commit into from
Dec 15, 2017
Merged

RFC: Rename indices to axes #25057

merged 1 commit into from
Dec 15, 2017

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Dec 13, 2017

I think this has been discussed in a few places - the most obvious reference is #23434. I figured it would be worth giving this a shot before feature freeze.

To me, indices is not the ideal name for this function. Currently, indices is defined on n-dimensional arrays to be a n-tuple of ranges for each dimension of the array.

I feel that the indices of an indexable container would probably be the collection of things you can index with, which is currently what the keys function does. In #23434 several candidate replacements were suggested, including axes, indexvectors, ... I'm proposing indexes here (an index can be a collection of individual entries, as in the index of a book; this function returns a collection of indexes) - but happy to discuss. (There's lots of other options... dimranges?) This PR now implements axes.

Going further, I feel there is a bit of discord between "keys" and "indices", since we have keys and haskey as doing discovery of things we can do getindex and setindex! with. I don't see the advantage on forcing users to learn and use multiple words for the same concept. We could consider harmonizing like so:

Current "Index" terminology "Key" terminology
getindex getindex getkey
setindex! setindex! setkey!
keys indices or index keys
haskey hasindex haskey

Personally, I'd prefer the "index" version.

Of course this goes well beyond this PR, but it is something that may motivate (and explain my motivation for) merging this PR's deprecation of indices for v0.7.

@andyferris andyferris added domain:arrays [a, r, r, a, y, s] needs decision A decision on this change is needed kind:deprecation This change introduces or involves a deprecation labels Dec 13, 2017
@nalimilan
Copy link
Member

indexes doesn't sound like a good idea given the outcome of #12902. Also, why not discuss this at #23434, where the discussion started?

@andyferris
Copy link
Member Author

Right - indexes might not be the greatest choice, but now with this PR it's merely a few seconds work for me to rename it :)

why not discuss this at #23434, where the discussion started?

Sometimes, there's nothing like some code to turn discussions into action (and feature freeze is upcoming, so it seemed worth making a start, or forgetting about it entirely).

@andyferris andyferris mentioned this pull request Dec 13, 2017
@andyferris andyferris changed the title RFC: Rename indices to indexes RFC: Rename indices to axes Dec 13, 2017
@andyferris
Copy link
Member Author

The PR now replaces indices with axes.

@andyferris andyferris added the status:triage This should be discussed on a triage call label Dec 14, 2017
@mschauer
Copy link
Contributor

Completely subjective, of course, but I got stuck more than once assuming indices is something you can iterate over.

@JeffBezanson
Copy link
Sponsor Member

+1

@StefanKarpinski
Copy link
Sponsor Member

@JeffBezanson: it's a bit hard to tell what your +1 is in reference to.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Dec 14, 2017

Triage generally agrees with this rename from indices to axes; there's some concern about AxisArrays, but I personally don't find that name terribly fitting, the package description says:

This package for the Julia language provides an array type (the AxisArray) that knows about its dimension names and axis values.

It seems like the package is really about giving names and units to dimensions, I'm not sure what the "and axis values" at the end of the sentence actually means and kind of feel like it's just a post-hoc justification of the package name. Sorry, I don't mean to come off as snarky (it's a great package), but I think this proposed usage of axes seems more fitting.

@StefanKarpinski
Copy link
Sponsor Member

I think that we should also move all of our key terminology to index and use getindex, setindex! and hasindex for both arrays and dicts.

@StefanKarpinski StefanKarpinski removed the status:triage This should be discussed on a triage call label Dec 14, 2017
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Dec 14, 2017
@andyferris
Copy link
Member Author

andyferris commented Dec 14, 2017

I think that we should also move all of our key terminology to index and use getindex, setindex! and hasindex for both arrays and dicts.

Is this possible for v1.0? If that's desired, then we can choose one of:

  • Make a v1.0 (not v0.7) patch to deprecate keys and haskey to indices and hasindex.
  • Make a breaking change of the meaning of indices in v0.7

@StefanKarpinski
Copy link
Sponsor Member

Ah, you mean the collision because indices currently means "give me the tuple of axes of this collection" whereas the new meaning we would want would be what we now call keys?

@andyferris
Copy link
Member Author

Yes, exactly.

@StefanKarpinski
Copy link
Sponsor Member

Hmm. That is a bit tough, but might be workable...

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 15, 2017

@andyferris why did you merge this? In general you should be letting someone else do it or at least approves it. If you don’t get a review, posting “will merge in X days” is also sometimes necessary.

Also, it looks like CI was not yet finished running.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 15, 2017

Actually, it looks like CI hasn’t even started yet here. I think we should quick revert this and reopen the PR.

@andyferris
Copy link
Member Author

why did you merge this? In general you should be letting someone else do it or at least approves it.

Triage seemed pretty definitive, and I assumed we're trying to clear the slate of v1.0 milestone items relatively quickly.

Is this a general thing that we won't merge PRs we open ourselves? Sorry if I was being too hasty.

it looks like CI hasn’t even started yet here.

CI did a full run, then I did a quick rebase and my local and the Linux CI tests passed before I merged. Sorry again if I'm being too hasty - but I did put in some effort to make sure that the build wouldn't break from merging this.

@StefanKarpinski
Copy link
Sponsor Member

Did this break the build or no?

@StefanKarpinski
Copy link
Sponsor Member

Doesn't seem like it. I think the new name is better, so I'm cool with this.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Dec 15, 2017

Did this break the build or no?

If CI had been permitted to run (most of them were unable to run), then we wouldn't have to be wondering this. I think the new name is fine too. I'm frustrated only in that Andy didn't push a revert commit, reopen the PR to run CI, and let someone else review or merge it.

@StefanKarpinski
Copy link
Sponsor Member

It's done now and and it didn't break the build, so let's carry on.

@fredrikekre
Copy link
Member

fredrikekre commented Dec 21, 2017

Should indices1 change to axes1?

@timholy
Copy link
Sponsor Member

timholy commented Dec 21, 2017

It's also possible we don't need indices1 now because we have better constant propagation. The only purpose of that function was a performance optimization (a necessary one, as indicated by BaseBenchmarks) to avoid the branch in

axes(A, d) = d > ndims(A) ? Base.OneTo(1) : axes(A)[d]

But now that branch may disappear all by itself.

@andyferris
Copy link
Member Author

Interprocedural constant prop is the Christmas gift from @vtjnash that keeps on giving! ❤️

@tknopp
Copy link
Contributor

tknopp commented Dec 29, 2017

The axes function is central part of the AxisArrays package and in turn of the entire Images infrastructure which is broken since this was merged and brought into Compat.

Different discussions around this issue

What is the advise here? Should axes from AxisArrays be renamed?
@timholy @mbauman

@tknopp
Copy link
Contributor

tknopp commented Dec 30, 2017

After reading through the subject (before I just wanted to track down the error) my vote is to rename Base.axes into Base.indices again. @timholy made an important comment here #23434 (comment)

The point is that we always have to distinguish between axes that are represented by their integer and axes that are represented by the physical units. I actually would call the later an axis since the former has a more special meaning (e.g. indexing into an array).

It is not possible to get both concepts under the same hood. Since AxisArrays is a very fundamental package and provides a very good concept for handling axes, I think it should remain the owner of this terminology.

@yurivish
Copy link
Contributor

yurivish commented Dec 30, 2017

Axes in AxisArrays feel like a much more natural use of the word -- it's a unit and an interval, which is enough information to e.g. create an axis visualization for a plot.

Echoing @timholy's comment, I wonder if there's another term that could be used. He suggested indexvectors; maybe it could be indexiterators? A bit verbose, but it would capture the semantics and match terminology with the new iteration protocol.

@nalimilan
Copy link
Member

I'm not a fan of the axes name either, but how about renaming AxisArrays? I know that's a radical proposal, but I've hoped we would be able to merge AxisArrays and NamedArrays for a long time so that can make them a central part of the stats ecosystem, and I must say the term "axis" doesn't help that goal. Indeed it's really not natural for people who just want to add categorical dimensions names (typically as strings) to their arrays (like R arrays). Actually I wonder whether it's very natural for categorical axes even in AxisArrays itself (e.g. is of set of two audio channels really an "axis"?). Could we find a more neutral term which satisfies both approaches, e.g. LabeledArrays?

See also a similar comment by @StefanKarpinski above.

@tknopp
Copy link
Contributor

tknopp commented Dec 30, 2017

@nalimilan: I don't see the issue with merging AxisArrays and NamedArrays. Is there something in NamedArrays that is not provided by AxisArrays? I would say that AxisArrays already is a pretty central part of the Julia ecosystem (through the use in Images.jl) and although this is certainly a little offtopic, AxisArrays would be a pretty good candidate for shipping as part of stdlib (depends how the later evolves).

Regarding the axis terminology I still find it pretty natural and after reading wikipedia: https://en.wikipedia.org/wiki/Cartesian_coordinate_system it still fits from my understanding.

Whatever action is done here (changing Base.axes or rename AxisArrays.axes or even rename AxisArrays) this needs to be done prior to 0.7 release. Changes in AxisArrays (and all(!) depending packages) is a major change that requires bandwidth of several package authors in particular of Tim Holy. This does not mean that it should not be done but it should be kept in mind.

@nalimilan
Copy link
Member

@nalimilan: I don't see the issue with merging AxisArrays and NamedArrays. Is there something in NamedArrays that is not provided by AxisArrays?

I haven't followed the development progress very closely recently, but off the top of my head AxisArrays don't support broadcasting yet, and the printing of NamedArrays is much nicer for categorical axes (see examples here). But these can be improved. My point here was just that axes is certainly not the most intuitive term for many applications (e.g. frequency tables), precisely because it refers to numerical coordinates.

@mbauman
Copy link
Sponsor Member

mbauman commented Jan 2, 2018

I'm working on getting myself back up to speed here. AxisArrays is long overdue for some love and a stronger direction. It's been torn between two schools of thought: is the axis information simply annotated metadata, with integer offsets still serving as the canonical indexing method? Or are the values in each axis the canonical (and only) set of indices available in the A[i] syntax?

When I tried to merge AxisArrays.axes into Base.indices in the past, I was still trying to support the former school of thought. I'm no longer convinced that's the best path forward. It's a big change, but moving towards the second school of thought should make these two names compatible. At that point, though, it's no longer clear that they're an AbstractArray at all… and then we get into the associative/array rabbit hole rather quickly.

The rename here is definitely an improvement — it makes way more sense that this function returns a collection of (axis) vectors if it's called axes. With the indices name, I'd expect a collection of things, where each thing is a valid index.

@tknopp
Copy link
Contributor

tknopp commented Jan 2, 2018

@mbauman: Will this be a more experimental development or do you have a clear plan? Just asking since we need a solution for 0.7 if Base.axes remains.

@andyferris
Copy link
Member Author

I tend to agree with @mbauman. The more I think about how I a (possibly mulitidimensional) AbstractDict might work, the more I come back to thinking about AxisArrays et al.

I see the integer offsets (AND linear indexing of standard, multidimensional arrays) as tokens that let you get values (and maybe keys) quickly. I'd tend to think about introducing some kind of gettoken interface to handle this (and use a trait like IndexStyle to indicate how the tokens of a given type work, or even if fast tokens exist at all).

@mbauman
Copy link
Sponsor Member

mbauman commented Jan 3, 2018

No, I don't have any concrete plans at the moment, and it's such a big change I'd like to put together a more cohesive proposal and seek consensus before enacting it.

@ajkeller34
Copy link
Contributor

ajkeller34 commented Jan 18, 2018

I'm not sure if this is the appropriate place to continue this conversation—shall I split this off into an AxisArrays issue?—but in any case I wanted to offer some perspective that touches on issues more general than AxisArrays.

For AxisArrays, I still think integer offset indexing is useful, and would find it troublesome to have to type out gettoken(A, 1) or similar to do that. I also agree with sentiments above that the axis values are really keys: the axis values can be categorical, which is a pretty obvious clue, but also numerical axis values need not be equally spaced, implying that even numerical values are keys. That being said, I think it's a pretty radical change to propose that an AxisArray is no longer an AbstractArray, but rather an AbstractDict.

Main message: I feel like a lot of confusion could be avoided were it not for the fact that getindex is used for both integer offset indexing and also for getting keys. If you subscribe to the view that integer offset indexing and getting keys are different operations, this then seems like an abuse of method overloading. So rather than introducing gettoken to supplement getindex, I'd suggest clarifying get to be getkey and encouraging its usage.

Curly brackets are open syntax (#8470) and could lower to getkey. That way you could provide some Dict-like functionality for AxisArrays while acknowledging that the underlying data structure is built around an Array. Of course, in the interest of not causing tremendous pain to everyone, you could have getkey fall back to getindex if there's no ambiguity in meaning, such that people can continue blithely getindex-ing dictionaries with square brackets. Seems I may have misunderstood which use of curly braces is available syntax. In any case, should suitable syntax be available, I'm not proposing new syntax simply to make using AxisArrays easier: I'm saying that it would be helpful in any situation where you could conceivably getindex or getkey the same object and have both operations be meaningful and useful.

Aside: I'm using AxisArrays to store the result of measurements. AxisArrays is good for this because you typically have some set of knobs you want to turn in a measurement and a concrete type for the measurement result. In this case, abstracting away the underlying storage order of the axes is actually undesirable. The first axis is the "fast axis," i.e. the measurement knob that is turned most frequently, and each successive axis is a measurement knob that is turned less frequently. That's potentially a physically meaningful distinction. Doing things in a different order could give different measurement results, therefore I want an Array with named axes, not a Dict with named axes.

@tknopp
Copy link
Contributor

tknopp commented Jan 18, 2018

I also use AxisArrays as a simple Array (with storage order) where I have the opportunity to encode the physical meaning into the array. Integer indexing is absolutely essential here. Indexing with boating point values (and/or units) might be nice in some circumstances but it is certainly not the only use case.

@andyferris
Copy link
Member Author

I think it's a pretty radical change to propose that an AxisArray is no longer an AbstractArray, but rather an AbstractDict.

I had a more radical (crazy?) proposal which was to make AbstractArray a subtype of AbstractDict, and just have a single indexing (and token) interface. Sorry for dragging this off topic - but I really couldn't resist biting to this one. :)

I also have sometimes wondered if get is simpler (better) than getindex.

(BTW, I can totally see that AxisArrays are useful as-is! But what occurs if my meaningful axis values really are Ints? The indexing method signatures couldn't tell if the user was wanting to use the array interface or the axis array interface)

@ajkeller34
Copy link
Contributor

In my most recent and often changing view, indexing method signatures are only ambiguous because people should be using get instead of getindex if they want to use the "keys" (i.e. axis values). I won't pretend this is a practical view.

I proposed using a macro to index by value in JuliaArrays/AxisArrays.jl#118. I can understand maybe that's not appealing and also had some reservations at the time. I guess alternatively one could have a macro for integer indexing and default to indexing by axis value. That could work for how I use AxisArrays with some adjustment period. I don't know though, I guess ultimately I feel like an AxisArray should be respected as an array and not a dict. Just to throw another random thought out there, perhaps another type could be defined that shares a lot of AxisArrays machinery but for which indexing is by axis value.

@mbauman
Copy link
Sponsor Member

mbauman commented Jan 18, 2018

I think this discussion would be best done as an issue on AxisArrays — issue JuliaArrays/AxisArrays.jl#84 is probably a great place for it to live.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] kind:deprecation This change introduces or involves a deprecation needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.