Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler syntax for creating uninitialized arrays #34775

Open
jkrumbiegel opened this issue Feb 16, 2020 · 18 comments
Open

Simpler syntax for creating uninitialized arrays #34775

jkrumbiegel opened this issue Feb 16, 2020 · 18 comments
Labels
arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests

Comments

@jkrumbiegel
Copy link
Contributor

I find the syntax for creating uninitialized arrays a bit verbose, while there are nice and short options for almost all other common cases of creating arrays:

# compare to
Float64[1, 2, 3, 4, 5]
zeros(Float64, 10)
ones(Float64, 10)
fill(1.0, 10)

But for uninitialized arrays you always have to use curly bracket syntax if I'm correct:

v = Vector{Float64}(undef, 10)

How about one of these two alternatives, which both seem to be available:

v = Float64[undef, 10]
arr = Int32[undef, 3, 4, 5]

# or

v = undef(Float64, 10)
arr = undef(Int32, 3, 4, 5)

The second one is actually easy to get via:

(::UndefInitializer)(T::Type, dims::Vararg{Int}) = Array{T}(undef, dims...)
@jakobnissen
Copy link
Contributor

See this issue and this Discourse discussion.

These posts are both very long and mostly discuss something other than your concrete proposal. However, there is at least one relevant point, namely that undef(Int, 10) returns an Array, when there are so many other AbstractArrays that could be usable.

Not sure I agree, though. The same could be said for zeros, ones and fill. Array still is by far the most used AbstractArray.

So think your proposed undef(T, dims...) syntax would be nice. It's short and explicit, and would probably be used quite often.

@martinholters
Copy link
Member

would probably be used quite often.

...which might be a reason not to do it...

@yuyichao
Copy link
Contributor

...which might be a reason not to do it...

It may be important to make the "uninitialized" part explicit, but I don't think it's necessary to make the syntax harder to use.

@JeffBezanson
Copy link
Sponsor Member

This is indeed somewhat intentional, to discourage uninitialized arrays. But we also wanted to move towards more general and regular syntax instead of all the special cases like zeros(...). The syntax undef(T, dims) is ok, but I question whether having more ways to write it is actually easier to use.

@JeffBezanson JeffBezanson added the arrays [a, r, r, a, y, s] label Feb 17, 2020
@timholy
Copy link
Sponsor Member

timholy commented Feb 29, 2020

If we want to increase uniformity, one thought for 2.0: deprecate zeros, ones in favor of fill(v, axes), and consider allowing fill(Undef{T}, axes) for an uninitialized array with eltype T. (fill(T, axes) won't work because what if you want to create an array of types?)

@KristofferC
Copy link
Sponsor Member

deprecate zeros, ones

Wasn't this already discussed, cf #24444?

@timholy
Copy link
Sponsor Member

timholy commented Feb 29, 2020

I guess I'm consistent!

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Feb 29, 2020

I’ve actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this:

Array{T}(undef, m, n)
Array{T}(zeros, m, n)
Array{T}(ones, m, n)

Why? It makes it easier to swap out any of the properties of what’s being done: it cleanly separates the container type, the element type, what to initialize it with and the dimensions.

@tpapp
Copy link
Contributor

tpapp commented Feb 29, 2020

Note also that while ones, fill etc make sense for most <: AbstractArray types, undef is the odd one out in the sense that is only practical for mutable arrays.

@KristofferC
Copy link
Sponsor Member

undef is the odd one out in the sense that is only practical for mutable arrays.

Not really, because in reality undef means uninitialized(which is what originally called). I made a PR to rename it to undef (shame on me) but in hind-sight, uninit would probably have been better.

@StefanKarpinski
Copy link
Sponsor Member

I think the point is that making an uninitialized immutable array isn't very useful.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Feb 29, 2020

Oh, yeah, I misread that.

@Sacha0
Copy link
Member

Sacha0 commented Feb 29, 2020

Comet topic! :)

I’ve actually often wanted a way to go in the other direction: factor out the initializer concept so that I can do things uniformly like this: [...]

For interested newcomers to this discussion, #24595 (comment) discusses this direction at length as 'the second proposal':

The more general extension of this model is MyArray[{...}](contentspec[, modifierspec...]). Roughly, contentspec defines the result's contents, while modifierspec... (if given) provides qualifications, e.g. shape.

@StefanKarpinski
Copy link
Sponsor Member

One thing we could do is:

  • make Array{T}(zeros, dims...) etc. work
  • make undef(T, dims...) and undef(dims...) work

That way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...) form.

@johnnychen94
Copy link
Sponsor Member

johnnychen94 commented Mar 1, 2020

make Array{T}(zeros, dims...) etc. work

What I see from this syntax is that whatever initializer put here should be as fast as undef. Since zeros is way sloweeer than undef, I think it's perhaps the time to get some updates on #130

julia> @btime zeros(Float64, 1000, 1000);
  443.589 μs (2 allocations: 7.63 MiB)

julia> @btime Array{Float64}(undef, 1000, 1000);
  37.140 μs (2 allocations: 7.63 MiB)

@tpapp
Copy link
Contributor

tpapp commented Mar 1, 2020

I am trying to think about the implications of these proposals for generic code. It is not clear to me if

  1. these methods (zeros, ones, fill) were meant to be convenience constructors for Array{T,N}, or more generic (and if yes, how generic? should there be a unified API for various collections of homogeneous items? does that even make sense?)
  2. if the motivation for zeros and ones is syntactic convenience (shorter than fill(one(T), dims...)), or something more abstract (as zero and one are, for additive and multiplicative identities), or speed (we can do zeros faster for some types?)

As for (1) a lot of packages define Base.zeros etc for their own types, which are not even necessarily <:AbstractArray. Should they do the same for the proposed undef(...) (if applicable)?

Regarding (2), it would be nice for custom types to be able to rely on a default like

function zeros(S::Type{SomeCustomType{T}}, shape...) where T
    fill(S, zero(eltype(S)), shape...)
end

and define only this fill method; zeros only when that confers an extra advantage. Then we could unify syntax with the fallback

function undef(SomeCustomType{T}, shape...)
    SomeCustomType{T}(undef, shape...)
end

@StefanKarpinski
Copy link
Sponsor Member

What I see from this syntax is that whatever initializer put here should be as fast as undef.

I don't understand why that should be the case. Yes, we want initializers to be as fast as we can make them, but some require more work than others. Why would we require that they all be as fast as doing nothing?

@KristofferC
Copy link
Sponsor Member

Since zeros is way sloweeer than undef

It is quite tricky to measure this since the OS can sometimes give out uninitialized memory "for free" and only commit to the actual allocation when the memory is used.

@brenhinkeller brenhinkeller added the feature Indicates new feature / enhancement requests label Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] feature Indicates new feature / enhancement requests
Projects
None yet
Development

No branches or pull requests