Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copyto! and similar for Datasets #937

Merged
merged 9 commits into from
Jun 7, 2022
Merged

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented May 13, 2022

Closes #580, #409

Here we implement read!, copyto!, and similar to allow data to be read into existing array buffers rather than allocating new buffers.

src/views.jl Outdated
Comment on lines 11 to 12
parent::Dataset
indices
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to store the dataspace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worthwhile eventually, but I think we should defer this to a larger overhaul of Dataset.

In particular, we should consider parameterizing Dataset and making it an AbstractArray or AbstractDiskArray for version 0.17.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #930

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards removing the view feature from this pull request.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea, but I would suggest making it actually a functional part: i.e. calling read!(dataset, array, I...) would internally call read!(view(dataset, I...), array)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I just think the view component is much more complicated than the other parts.

src/HDF5.jl Show resolved Hide resolved
src/HDF5.jl Outdated
Read [part of] a dataset or attribute into a preallocated output buffer.
The output buffer must be convertible to a pointer and have a contiguous layout.
"""
function Base.read!(obj::DatasetOrAttribute, buf::AbstractArray{T}, I...) where T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why do we need this method? Seems like copyto! alone would be sufficient as per the AbstractArray interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently implement Base.read, so why not Base.read!?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should define this method, because we are moving more and more having the HDF5.Dataset type satisfying the AbstractArray interface. Users would expect copyto! as that is the standard name for this function, thus additionally adding Base.read! seems unnecessary.

I understand that we are defining methods on Base.read throughout the HDF5 module, but I don't see that we should extend that and add Base.read!

The definition in IO/Networking, also suggests that this wouldn't be the best overload, since it's used by default for reading from a filename or stream.
https://docs.julialang.org/en/v1/base/io-network/#Base.read!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.read! has been removed.

src/HDF5.jl Outdated

Return a `Array{T}` or `Matrix{UInt8}` to that can contain [part of] the dataset.
"""
function Base.similar(obj::DatasetOrAttribute, I::Integer...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why do we need to pass the indices here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base.similar accepts an optional size value. These are not indices. I suppose we should rename the argument.
https://docs.julialang.org/en/v1/base/arrays/#Base.similar

Copy link
Member

@musm musm Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more specifically looking at the AbstractArray interface https://docs.julialang.org/en/v1/manual/interfaces/#man-interface-array, where there are 4 optional similar methods that we could define:

Optional methods Default definition Brief description
similar(A) similar(A, eltype(A), size(A)) Return a mutable array with the same shape and element type
similar(A, ::Type{S}) similar(A, S, size(A)) Return a mutable array with the same shape and the specified element type
similar(A, dims::Dims) similar(A, eltype(A), dims) Return a mutable array with the same element type and size dims
similar(A, ::Type{S}, dims::Dims) Array{S}(undef, dims) Return a mutable array with the specified element type and size

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we do implement AbstractArray in 0.17, then we do not need this signature. We could just implement the version with Dims. However, since we are not implementing AbstractArray in 0.16, we probably do need to retain a method with the above signature since we do not have a similar(a::AbstractArray{T}, dims::Union{Integer, AbstractUnitRange}...) to cover that case for us.

I will rename this to Base.similar(obj::DatasetOrAttribute, dims::Dims). We probably sill need Base.similar, dims::Integer...) as well, until we actually do implement AbstractArray.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I think read! makes more sense as long as we're using read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonbyrne did you intend to comment on the other thread? Is there a specific part of #937 (comment) you disagree with?
The goal is not to remove copyto! and replace it with read!. Originally, @mkitti had read! as an alias for copyto!, which we require in order to satisfy part of the AbstractArray interface. My comment on read! was that this was both redundant with copyto! and unexpected since read! is used in base Julia for reading from an IO stream/filename, which is very different than what copyto! accomplishes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes, sorry...

@mkitti mkitti changed the title Add read!, copyto! and similar for Datasets Add copyto! and similar for Datasets Jun 6, 2022
src/HDF5.jl Outdated
function Base.similar(
obj::DatasetOrAttribute,
::Type{T},
dims::Integer...;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have dims::Dims here and below?

src/HDF5.jl Outdated
buf::Union{AbstractMatrix{UInt8}, AbstractArray{T}, Nothing}, I...) where T

sz, scalar, dspace = _size_of_buffer(obj, I)
memtype = _memtype(filetype, T)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this below the buffer size checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rearranged this and added more try blocks so we can make sure to close out the handles.

@musm
Copy link
Member

musm commented Jun 6, 2022

Overall LGTM sans minor comments. Latest changes are much improved.

@mkitti
Copy link
Member Author

mkitti commented Jun 7, 2022

I'm not entirely sure about this normalize keyword for similar. For many circumstances it may not matter.

@mkitti
Copy link
Member Author

mkitti commented Jun 7, 2022

My plan after this is to revisit views of Dataset in #946

@musm
Copy link
Member

musm commented Jun 7, 2022

Awesome, thanks! The improvements and clean up to the generic read function are defintely welcome.

@musm musm merged commit 976ec2a into master Jun 7, 2022
@musm musm deleted the janelia/non_allocating_reads branch June 7, 2022 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permit reading values into a preexisting array buffer
3 participants