Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cumsum, accumulate should not have default dim=1 #19451

Closed
StefanKarpinski opened this issue Nov 29, 2016 · 4 comments
Closed

cumsum, accumulate should not have default dim=1 #19451

StefanKarpinski opened this issue Nov 29, 2016 · 4 comments
Assignees
Labels
domain:arrays [a, r, r, a, y, s]
Milestone

Comments

@StefanKarpinski
Copy link
Sponsor Member

This is inconsistent with how non-cumulative functions work in Julia, e.g.:

julia> A = rand(3,4)
3×4 Array{Float64,2}:
 0.239183  0.420538  0.760069  0.349576
 0.768802  0.924563  0.934626  0.446843
 0.767848  0.761547  0.518525  0.758667

julia> sum(A)
7.650786585136748

julia> cumsum(A)
3×4 Array{Float64,2}:
 0.239183  0.420538  0.760069  0.349576
 1.00798   1.3451    1.6947    0.796419
 1.77583   2.10665   2.21322   1.55509

Correspondingly, cumsum(A) should, rather than asymmetrically defaulting to the first dimension, either be an error, or return an accumulation of the elements of A such that cumsum(A)[end] == sum(A) and such that cumsum(v')' == cumsum(v) for any vector v. One possibility is performing a cumulative sum in column-major order. Another is that each output value is the sum of values to the values to the left or above that slot in the intput. I.e.:

julia> [sum(A[1:i,1:j]) for i=1:size(A,1), j=1:size(A,2)]
3×4 Array{Float64,2}:
 0.239183  0.65972  1.41979  1.76937
 1.00798   2.35308  4.04778  4.8442
 1.77583   3.88248  6.0957   7.65079

This answer has the above properties, can be computed efficiently, and is useful.

@andreasnoack
Copy link
Member

andreasnoack commented Nov 29, 2016

I agree that cumsum shouldn't default to dim=1 but I think it is better to throw an error if no dimension is specified and ndims>1. I don't like that storage order becomes significant for anything but performance and floating point errors. Short term, there would also be a deprecation issue.

@StefanKarpinski
Copy link
Sponsor Member Author

I agree that it's not desirable for storage order to become semantically visible. That's why I proposed the [sum(A[1:i,1:j]) for i=1:size(A,1), j=1:size(A,2)] definition as well. Wanting cumsum(v')' == cumsum(v) may tie into #4774 depending on how that pans out.

@StefanKarpinski
Copy link
Sponsor Member Author

See also #23542, #20041.

@StefanKarpinski
Copy link
Sponsor Member Author

The minimal plan of action here is:

  1. Make not supplying a dimension an error unless all dimensions but the first are singleton.
  2. Make accumulate on a row vector work as accumulate(f, r.').'.

In the future, this could be generalized to the n-dimensional prefix summation behavior I proposed above, which is generally useful and ties the desired behavior for vectors and row vectors together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests

3 participants