-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where selection for Netcdf raster slower than for in Memory raster with same dimensions #594
Comments
This might be a Rasters issue in the end. The above timings are from a Raster wrapping a YAXArray and this might contribute to the long timings. But when I do the selection on a Raster compared to a YAXArray the selection is a factor four roughly slower. And the YAXArrays selection is also a factor two slower than the selection on an in memory based Raster. urls = Dict(y=> "https://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_$(y)01-$(y)12.nc" for y in 1995:2018)
filenames = Dict(y=> joinpath("data", split(url, "/")[end]) for (y,url) in urls)
mkpath("data/")
#pathdict = Dict(y => download(url, filenames[y]) for (y, url) in urls)
# After download this should be this:
pathdict = [glob("tas_EUR*$y*.nc", "/home/fcremer/Documents/NFDI4Earth/deliverables/data")[1] for y in 1995:2018]
rlist = Raster.(pathdict, lazy=true, key="tas");
r = cat(rlist..., dims=Ti)
julia> dslist = open_dataset.(pathdict)
taslist = getproperty.(dslist, :tas)
c = cat(taslist..., dims=Ti)
julia> @btime c[Ti=Where(x->year(x)==1995)];
66.370 ms (26354 allocations: 1.98 MiB)
julia> @btime r[Ti=Where(x->year(x)==1995)];
232.956 ms (7357 allocations: 232.82 MiB) |
This looks like the problem:
Which is probably because youre using |
Is this still true? the files dont seem to be there anymore to test it. |
The data was moved. I try to test it this week. |
The location of the files have been moved.
I will try it again in the next days. |
I tested it again and these are the new timings: julia> @btime c[Ti=Where(x->year(x)==1995)]; # YAXArrays with NetCDF backend
71.301 ms (26373 allocations: 1.98 MiB)
julia> @btime r[Ti=Where(x->year(x)==1995)]; # Rasters with NCDatasets backend
252.776 ms (6246 allocations: 232.31 MiB) This might also be some difference between the backends. This is with these package versions: (Rasters) pkg> st
Project Rasters v0.11.1
Status `~/.julia/dev/Rasters/Project.toml`
[79e6a3ab] Adapt v4.0.4
[3da002f7] ColorTypes v0.11.5
[1fbeeb36] CommonDataModel v0.3.6
[187b0558] ConstructionBase v1.5.5
⌃ [0703355e] DimensionalData v0.27.2
⌅ [3c3547ce] DiskArrays v0.3.23
[411431e0] Extents v0.1.2
[1a297f60] FillArrays v1.11.0
[4c728ea3] Flatten v0.4.3
[68eda718] GeoFormatTypes v0.4.2
[cf35fbd7] GeoInterface v1.3.4
[e1d29d7a] Missings v1.2.0
[30363a11] NetCDF v0.11.8
[6fe1bfb0] OffsetArrays v1.14.0
[92933f4c] ProgressMeter v1.10.0
[3cdcf5f2] RecipesBase v1.3.4
[189a3867] Reexport v1.2.2
[efcf1570] Setfield v1.1.1
[c21b50f5] YAXArrays v0.5.6
[ade2ca70] Dates
[a63ad114] Mmap |
Ok this not just the time to load the data?
|
Ah yeah right. I forgot that. So this is a useless comparison. I will test the view but most likely this can be closed |
It is kind of annoying that the lazy syntax is harder to write in julia generally. |
When I am doing a Where Selector it seems to be quite slow compared to other selection methods.
I expect that the Where selector is slower because it has to check the function for every element of the axis.
But in my case the selection takes 2 seconds while the selection function takes 30 nanoseconds and when I multiply it with the number of elements I get 10 microseconds.
This selection is much faster when I do it with random data that I generate compared to the selection on a netcdf based Raster.
When I construct a Raster with the same dimensions as
ras
but a random array the timing is similar to the fully random based Raster.The text was updated successfully, but these errors were encountered: