Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF3 files (64-bit offset NetCDF,...) and unlimited dimensions (issue #231) #232

Merged
merged 5 commits into from
Oct 26, 2023

Conversation

Alexander-Barth
Copy link
Owner

@Alexander-Barth Alexander-Barth commented Oct 10, 2023

Try to address issue #231.

src/variable.jl Outdated
@@ -401,7 +401,7 @@ end
getchunksize(v::Variable) = getchunksize(haschunks(v),v)
getchunksize(::DiskArrays.Chunked, v::Variable) = chunking(v)[2]
# getchunksize(::DiskArrays.Unchunked, v::Variable) = DiskArrays.estimate_chunksize(v)
getchunksize(::DiskArrays.Unchunked, v::Variable) = size(v)
getchunksize(::DiskArrays.Unchunked, v::Variable) = max.(1,size(v))
Copy link
Contributor

@rafaqz rafaqz Oct 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to comment the reason for this, its a bit puzzling otherwise

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should add a comment, but while we are at this line. @tcarion , is there any particular reason why the DiskArrays.estimate_chunksize is commented-out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's because I had this error:

NCDatasets: Error During Test at /home/tcarion/.julia/dev/NCDatasets/test/runtests.jl:10
  Got exception outside of a @test
  LoadError: MethodError: no method matching DiskArrays.RegularChunks(::Tuple{UnitRange{Int64}, UnitRange{Int64}}, ::Int64, ::Int64)
  
  Closest candidates are:
    DiskArrays.RegularChunks(::Integer, ::Integer, ::Integer)
     @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:25
  
  Stacktrace:
    [1] (::DiskArrays.var"#24#27")(s::Int64, cs::Tuple{UnitRange{Int64}, UnitRange{Int64}}, of::Int64)
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:133
    [2] map
      @ ./tuple.jl:318 [inlined]
    [3] DiskArrays.GridChunks(a::Tuple{Int64, Int64}, chunksize::Tuple{Tuple{UnitRange{Int64}, UnitRange{Int64}}}; offset::Tuple{Int64, Int64})
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:132
    [4] DiskArrays.GridChunks(a::NCDatasets.Variable{Float64, 2, NCDataset{Nothing}}, chunksize::Tuple{Tuple{UnitRange{Int64}, UnitRange{Int64}}}; offset::Tuple{Int64, Int64})
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130
    [5] DiskArrays.GridChunks(a::NCDatasets.Variable{Float64, 2, NCDataset{Nothing}}, chunksize::Tuple{Tuple{UnitRange{Int64}, UnitRange{Int64}}})
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/chunks.jl:130
    [6] eachchunk(v::NCDatasets.Variable{Float64, 2, NCDataset{Nothing}})
      @ NCDatasets ~/.julia/dev/NCDatasets/src/variable.jl:408
    [7] _readblock!(::NCDatasets.Variable{Float64, 2, NCDataset{Nothing}}, ::Matrix{Float64}, ::UnitRange{Int64}, ::Vararg{AbstractVector})
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:168
    [8] readblock!
      @ ~/.julia/packages/DiskArrays/QlfRF/src/batchgetindex.jl:209 [inlined]
    [9] getindex_disk(::NCDatasets.Variable{Float64, 2, NCDataset{Nothing}}, ::UnitRange{Int64}, ::Vararg{Any})
      @ DiskArrays ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:40
   [10] getindex
      @ ~/.julia/packages/DiskArrays/QlfRF/src/diskarray.jl:211 [inlined]
   [11] getindex(::CommonDataModel.CFVariable{Float64, 2, NCDatasets.Variable{Float64, 2, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Nothing, Tuple{}, Vararg{Nothing, 5}}}}, ::UnitRange{Int64}, ::StepRange{Int64, Int64})
      @ CommonDataModel ~/.julia/packages/CommonDataModel/pO4st/src/cfvariable.jl:390
   [12] top-level scope
      @ ~/.julia/dev/NCDatasets/test/test_simple.jl:28
[...]
  in expression starting at /home/tcarion/.julia/dev/NCDatasets/test/test_simple.jl:28

I found an easy fix by using size instead, but I didn't think it through and kind of forgot it, that's my bad... I must admit I don't fully know how the chunking should work. But I think the issue comes from the fact getchunksize(::DiskArrays.Unchunked, v::Variable) should return a Tuple, while DiskArrays.estimate_chunksize(v) returns a GridChunks.

Copy link
Contributor

@rafaqz rafaqz Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~RegularChunks is for a single axis, and it accepts three Integers. ~

GridChunks wraps a Tuple of RegularChunks and IrregularChunks, one for each axis - selected depending on the underlying chunking of the disk array.

Ah sorry misunderstood the confusion here. Normally estimate_chunksize is called from eachchunk. You could refactor getchunks to return GridChunks instead of the size.

You may want to call estimate_chunksize when there are no unlimited dimensions, and otherwise put a 1 in the unlimited dimension and use the size for all the other dimensions.

@@ -401,7 +401,7 @@ end
getchunksize(v::Variable) = getchunksize(haschunks(v),v)
getchunksize(::DiskArrays.Chunked, v::Variable) = chunking(v)[2]
# getchunksize(::DiskArrays.Unchunked, v::Variable) = DiskArrays.estimate_chunksize(v)
getchunksize(::DiskArrays.Unchunked, v::Variable) = size(v)
getchunksize(::DiskArrays.Unchunked, v::Variable) = max.(1,size(v))
eachchunk(v::CFVariable) = eachchunk(v.var)
haschunks(v::CFVariable) = haschunks(v.var)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify this declaration on CFVariable doesn't really do much, as it doesn't define the necessary broadcast or iteration methods to actually call this in real use.

@Alexander-Barth
Copy link
Owner Author

I changed the PR quite a bit. It tests now explicitly for NetCDF3 files to determine the chunksize (Unidata/netcdf-c#2224) also we use DiskArrays.estimate_chunksize for unchunked variables. Does this look fine to you?

@Alexander-Barth Alexander-Barth changed the title 64-bit offset NetCDF and unlimited dimensions (issue #231) NetCDF3 files (64-bit offset NetCDF,...) and unlimited dimensions (issue #231) Oct 17, 2023
@Alexander-Barth Alexander-Barth merged commit a0ca027 into master Oct 26, 2023
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants