Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix common data model chunk propagation, source type propagation, and other things #592

Merged
merged 8 commits into from
Mar 10, 2024

Conversation

rafaqz
Copy link
Owner

@rafaqz rafaqz commented Jan 19, 2024

This PR fixes a bunch of minor issues with the CommonDataModel.jl conversion.

  • chunking is not propagated out through CommonDataModel.jl so we hack around it by fetching the inner chunks
  • source type was not propagated through the shared FileArray object so online files without extensions were reverting to gdal (but was not tested because we had reliability problems in the tests)
  • some minor renaming of funcitons that still had the _ncd prefix in the commondatamodel.jl file.

I'll need to write a for more tests to lock this down.

@@ -14,3 +12,7 @@ function RA._open(f, ::Type{GRIBsource}, filename::AbstractString; write=false,
ds = GRIBDatasets.GRIBDataset(filename)
RA._open(f, GRIBsource, ds; kw...)
end

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcarion @Alexander-Barth this is the hack that is currently required to make disk arrays chunking propagate out from the internal Varable or DataValues objects.

Note that with your current implementation this means anyone using CFVariable can not do chunked reads of large datasets. The whole thing is read at once.

DiskArrays.haschunks(var::CFVariable) = _get_haschunks(var.var.var)

function _get_eachchunk end
function _get_haschunks end
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcarion @Alexander-Barth this is where the hack is called to dig into the inner array that knows the actual chunk pattern. With the current design of NCDatasets.jl/GRIBDatasets.jl/CommonDataModel.jl anyone wanting chunking to work with CFVariable will need to do a hack like this.

(Note this is the Rasters.CFVariable, not type piracy)

Copy link

codecov bot commented Mar 10, 2024

Codecov Report

Attention: Patch coverage is 92.80576% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 81.97%. Comparing base (164973d) to head (9dd6d0e).

Files Patch % Lines
src/sources/commondatamodel.jl 92.72% 4 Missing ⚠️
ext/RastersHDF5Ext/smap_source.jl 0.00% 3 Missing ⚠️
src/lookup.jl 91.66% 1 Missing ⚠️
src/methods/reproject.jl 0.00% 1 Missing ⚠️
src/show.jl 96.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #592      +/-   ##
==========================================
- Coverage   82.24%   81.97%   -0.27%     
==========================================
  Files          61       61              
  Lines        4257     4245      -12     
==========================================
- Hits         3501     3480      -21     
- Misses        756      765       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rafaqz rafaqz merged commit dacc30e into main Mar 10, 2024
10 of 11 checks passed
@rafaqz rafaqz deleted the cdm_fixes branch March 10, 2024 20:36
@rafaqz rafaqz restored the cdm_fixes branch March 17, 2024 13:33
@rafaqz rafaqz deleted the cdm_fixes branch July 12, 2024 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant