Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem detecting a source file type #356

Closed
yvikhlya opened this issue Dec 28, 2022 · 6 comments
Closed

Problem detecting a source file type #356

yvikhlya opened this issue Dec 28, 2022 · 6 comments

Comments

@yvikhlya
Copy link
Contributor

I am working with a general circulation model which uses netCDF files with different weird file name extensions as inputs and outputs (e.g. .nc3 or .nc4 to distinguish between netCDF3/4). These files can't be loaded into Raster/RasterStack, since source type is derived by a _sourcetype(...) function in src/convenience.jl from file name extension, which is .nc for netCDF files, and it defaults to GDAL type. When I use a source=Rasters.NCDfile keyword, this does not work either, because this keyword does not propagate down into sub-calls of RasterStack(), and _open(...) in src/convenience.jl again tries to derive source type from _sourcetype(filename) and calls GDAL loader instead of netCDF loader. Here is a complete traceback:

julia> st=RasterStack("/home/yvikhlia/aogcm/yv_s2s4mom5_1/geosgcm_ocn2d/yv_s2s4mom5_1.geosgcm_ocn2d.monthly.198202.nc4", source=Rasters.NCDfile)
ERROR: MethodError: reducing over an empty collection is not allowed; consider supplying `init` to the reducer
Stacktrace:
  [1] mapreduce_empty(#unused#::typeof(identity), op::Function, T::Type)
    @ Base ./reduce.jl:367
  [2] reduce_empty(op::Base.MappingRF{typeof(identity), typeof(promote_type)}, #unused#::Type{DataType})
    @ Base ./reduce.jl:356
  [3] reduce_empty_iter
    @ ./reduce.jl:379 [inlined]
  [4] mapreduce_empty_iter(f::Function, op::Function, itr::Vector{DataType}, ItrEltype::Base.HasEltype)
    @ Base ./reduce.jl:375
  [5] _mapreduce(f::typeof(identity), op::typeof(promote_type), #unused#::IndexLinear, A::Vector{DataType})
    @ Base ./reduce.jl:427
  [6] _mapreduce_dim
    @ ./reducedim.jl:365 [inlined]
  [7] #mapreduce#765
    @ ./reducedim.jl:357 [inlined]
  [8] mapreduce
    @ ./reducedim.jl:357 [inlined]
  [9] #reduce#767
    @ ./reducedim.jl:406 [inlined]
 [10] reduce
    @ ./reducedim.jl:406 [inlined]
 [11] pixeltype(ds::ArchGDAL.Dataset)
    @ ArchGDAL /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/dataset.jl:997
 [12] ArchGDAL.RasterDataset(ds::ArchGDAL.Dataset)
    @ ArchGDAL /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/raster/array.jl:32
 [13] #unsafe_readraster#113
    @ /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/raster/array.jl:126 [inlined]
 [14] unsafe_readraster
    @ /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/raster/array.jl:126 [inlined]
 [15] readraster(f::ComposedFunction{typeof(Rasters.cleanreturn), Rasters.var"#53#54"{Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:lazy,), Tuple{Bool}}}, String}}, args::String; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ ArchGDAL /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/context.jl:265
 [16] readraster(f::Function, args::String)
    @ ArchGDAL /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/ArchGDAL/AZONk/src/context.jl:264
 [17] _open(f::Function, ::Type{Rasters.GDALfile}, filename::String; write::Bool, kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/sources/gdal.jl:133
 [18] _open(f::Function, ::Type{Rasters.GDALfile}, filename::String)
    @ Rasters /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/sources/gdal.jl:119
 [19] _open(f::Function, filename::String; kw::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Rasters /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/convenience.jl:20
 [20] _open
    @ /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/convenience.jl:19 [inlined]
 [21] #Raster#52
    @ /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/array.jl:266 [inlined]
 [22] RasterStack(filename::String; dims::Nothing, refdims::Tuple{}, metadata::Nothing, crs::Nothing, mappedcrs::Nothing, layerdims::Nothing, layermetadata::Nothing, missingval::Nothing, source::Type, name::Nothing, keys::Nothing, layersfrom::Nothing, resize::Nothing, lazy::Bool, ext::Nothing)
    @ Rasters /discover/nobackup/projects/gmao/ssd/aogcm/yvikhlia/julia/packages/Rasters/gV0im/src/stack.jl:277
 [23] top-level scope
    @ REPL[11]:1

If I change a file name extension of source file to .nc, it is loaded without problems. A quick solution to this issue would be to add more extensions to EXT and REV_EXT structures in src/convenience.jl, but this would be incomplete solution. A better solution would be to modify Raster(...), RasterStack(...), _open(...) and other functions which rely on source type such that it is derived from user input if the input is not empty or file name extension in the other case.

@rafaqz
Copy link
Owner

rafaqz commented Dec 28, 2022

Yeah, we should be able to choose the backend manually. We could also look at detecting file types from the file header, but that's a big job.

If you want to fix the keyword propagation a PR would be great.

@yvikhlya
Copy link
Contributor Author

If you want to fix the keyword propagation a PR would be great.

I have almost no experience with julia, besides reading a bunch of tutorials and following examples, but I can give it a try. This should not be too difficult, I guess. My immediate suggestion would be to change the keyword argument source=_sourcetype(filename) to default to source=nothing and define sourcetype= isnothing(source) ? _sourcetype(filename) : source in every function which calls _sourcetype(filename). Does this sound reasonable?

@rafaqz
Copy link
Owner

rafaqz commented Dec 28, 2022

I think the problem is smaller than that - what you are describing should already propagate source, that assignment source=_sourcetype(filename) only happens if source is not passed in.

So somewhere in the that stack trace between line 23 and 18 the chain of source keyword arguments is being broken, you just have to find it! It's only three or four possible places.

Then make a test in test/ncdatasets.jl where you rename a file to a weird extension and then set the source to NCDfile manually, so it can't break again.

I will help tidy up in review, just see if you can get your files loading.

@yvikhlya
Copy link
Contributor Author

yvikhlya commented Dec 28, 2022

I see some inconsistencies here. For example, in src/stack.jl line 261:

st = if haslayers(_sourcetype(filename))
....
else
....
end

So, even if to provide source=NCDfile as argument to RasterStack(...), else part is executed, instead of if part. On the other hand RasterStack(...) function at line 181 is implemented the way I suggested above. I'll look at _sourcetype usage in different cases more carefully and play with it before submitting a PR.

@rafaqz
Copy link
Owner

rafaqz commented Dec 28, 2022

Right, yes that looks like a bug. Fix the propagation of source there and it should work.

@rafaqz
Copy link
Owner

rafaqz commented Apr 16, 2023

you can now use source=:netcdf

@rafaqz rafaqz closed this as completed Apr 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants