-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading cropped array with Rasters is 4x slower than with GeoArrays #618
Comments
Is this on And is that the first or second run? Rasters.jl will be slower first run as the object is more complicated to compile. That many allocations really looks like compilation, there is usually only a hundred or so on the second run.
Another thing may be that Rasters.jl is reading chunked from disk through a lazy You will need to share that file for me to benchmark it - if you can just add a |
This is using Rasters v0.10.1 This was benched after several runs, so it's not a compilation issue. I suspect it's related to how Rasters is reading chunked from disk The file I'm using is a |
Ok with a huge lazy file you should use dem = open(raster) do A
dem = Raster(pathlocal.cop30_v2; lazy=true);
read(Rasters.crop(dem, to=ext));
end I need to document it better that But this can actually be fixed inside |
Ok I will fix this in |
foo = Raster(open(fn); lazy=true);
ERROR: MethodError: no method matching Raster(::IOStream; lazy::Bool)
Closest candidates are:
Raster(::Any, ::Tuple; name, kw...)
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:231
Raster(::Any, ::AbstractString; ...)
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:257
Raster(::Any, ::AbstractString, ::Any; crs, mappedcrs, dims, refdims, name, metadata, missingval, source, write, lazy, dropband)
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:257
...
Stacktrace:
[1] top-level scope
@ REPL[68]:1 |
No you call (I'm not recommending you things that don't work at all here ;) But I did mess up the example above. This should work: dem = open(Raster(pathlocal.cop30_v2; lazy=true)) do A
read(Rasters.crop(dem, to=ext));
end But I'm afraid it might not help with I will really need some kind of MWE to work this out... |
Got it, I will work on a MWE |
It seems I'm unable to replicate without a large Testing on a large geotiff, Raster.jl outperforms GeoArrays.jl: using Downloads
using Extents
using BenchmarkTools
using GeoArrays
using Rasters
url = "https://its-live-data.s3.amazonaws.com/velocity_mosaic/landsat/v00.0/static/cog/ANT_G0120_0000_v.tif"
fn = Downloads.download(url, "ANT_G0120_0000_v.tif")
ext = Extent(X = (0, 1000000), Y = (0, 1000000));
nt = (min_x = 0, min_y = 0, max_x = 10000000, max_y = 10000000);
@time ga = GeoArrays.read(fn; masked=false);
0.004583 seconds (685 allocations: 73.367 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
7.615868 seconds (27 allocations: 1.647 GiB, 0.52% gc time)
@time dem = Raster(fn; lazy=true);
0.002417 seconds (167 allocations: 10.484 KiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
1.551789 seconds (19.70 k allocations: 530.634 MiB) |
comparing that with a large vrt: fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
ext = Extent(X = (166.0, 168.0), Y = (-80.0, -78.0));
nt = (min_x = 166.0, min_y = -80.0, max_x = 168.0, max_y = -78.0);
@time ga = GeoArrays.read(fn; masked=false);
1.315752 seconds (83 allocations: 6.484 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
1.198129 seconds (24 allocations: 80.075 MiB)
@time dem = Raster(fn; lazy=true);
1.398906 seconds (53.04 k allocations: 9.666 MiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
4.080412 seconds (24.72 k allocations: 161.149 MiB) |
Can you get the chunk patterns of both the tif and vrt for comparison ? You can also post some code that creates a large |
Code to build a large .vrt: # build COP30 vrts
using ArchGDAL
# info page
# https://spacedata.copernicus.eu/web/guest/collections/copernicus-digital-elevation-model?p_l_back_url=%2Fweb%2Fguest%2Fsearch%3Fq%3DDEM
# 2021 release available on aws:
# https://registry.opendata.aws/copernicus-dem/
# download all dems using AWS CLI:
# aws s3 cp s3:https://copernicus-dem-30m/ /mnt/devon-r2/shared_data/copernicus-dem-30m/ --recursive --no-sign-request
path2folder = "/mnt/devon-r2/shared_data/copernicus-dem-30m/"
# 2019 release was downloaded with ancillary files: /mnt/devon-r2/shared_data/COP-DEM_GLO-30-DGED_PUBLIC/
# path2folder = "/mnt/devon-r2/shared_data/COP-DEM_GLO-30-DGED_PUBLIC/";
# Coordinate Reference System: Horizontal WGS84-G1150 (EPSG 4326) (DGED & DTED format), Vertical EGM2008 (EPSG 3855)
suffix = ["WBM.tif", "DEM.tif", "DEM_int16.tif", "EDM.tif", "FLM.tif", "HEM.tif"];
# EDM.tif = Editing Mask: 8 Bit unsigned integer, GeoTIFF
# FLM.tif = Filling Mask: 8 Bit unsigned integer, GeoTIFF
# HEM.tif = Height Error Mask: 32 Bit floating point, GeoTIFF
# WBM.tif = Water Body Mask: 8 Bit unsigned integer, GeoTIFF
# this takes a long time as it maps 440972 files
items = [item for item in walkdir(path2folder)]
files = [[joinpath(root, file) for file in files] for (root, dirs, files) in items]
files = reduce(vcat,files)
# build single .vrt for all .tif files with suffix
for suff in suffix
out_vrt = joinpath(path2folder, replace(suff, ".tif" => ".vrt"))
ind = endswith.(files,suff)
in_tifs = files[ind]
in_tifs = ArchGDAL.read.(in_tifs)
ArchGDAL.gdalbuildvrt(in_tifs; dest=out_vrt) do vrt
ArchGDAL.write(vrt, out_vrt)
end
end |
Chunk pattern of fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
dem = Raster(fn; lazy=true);
DiskArrays.eachchunk(dem)
4099×4894 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
(1:128, 1:128) (1:128, 129:256) (1:128, 257:384) (1:128, 385:512) (1:128, 513:640) … (1:128, 625921:626048) (1:128, 626049:626176) (1:128, 626177:626304) (1:128, 626305:626400)
(129:256, 1:128) (129:256, 129:256) (129:256, 257:384) (129:256, 385:512) (129:256, 513:640) (129:256, 625921:626048) (129:256, 626049:626176) (129:256, 626177:626304) (129:256, 626305:626400)
(257:384, 1:128) (257:384, 129:256) (257:384, 257:384) (257:384, 385:512) (257:384, 513:640) (257:384, 625921:626048) (257:384, 626049:626176) (257:384, 626177:626304) (257:384, 626305:626400)
(385:512, 1:128) (385:512, 129:256) (385:512, 257:384) (385:512, 385:512) (385:512, 513:640) (385:512, 625921:626048) (385:512, 626049:626176) (385:512, 626177:626304) (385:512, 626305:626400)
(513:640, 1:128) (513:640, 129:256) (513:640, 257:384) (513:640, 385:512) (513:640, 513:640) (513:640, 625921:626048) (513:640, 626049:626176) (513:640, 626177:626304) (513:640, 626305:626400)
(641:768, 1:128) (641:768, 129:256) (641:768, 257:384) (641:768, 385:512) (641:768, 513:640) … (641:768, 625921:626048) (641:768, 626049:626176) (641:768, 626177:626304) (641:768, 626305:626400)
(769:896, 1:128) (769:896, 129:256) (769:896, 257:384) (769:896, 385:512) (769:896, 513:640) (769:896, 625921:626048) (769:896, 626049:626176) (769:896, 626177:626304) (769:896, 626305:626400)
(897:1024, 1:128) (897:1024, 129:256) (897:1024, 257:384) (897:1024, 385:512) (897:1024, 513:640) (897:1024, 625921:626048) (897:1024, 626049:626176) (897:1024, 626177:626304) (897:1024, 626305:626400)
(1025:1152, 1:128) (1025:1152, 129:256) (1025:1152, 257:384) (1025:1152, 385:512) (1025:1152, 513:640) (1025:1152, 625921:626048) (1025:1152, 626049:626176) (1025:1152, 626177:626304) (1025:1152, 626305:626400)
(1153:1280, 1:128) (1153:1280, 129:256) (1153:1280, 257:384) (1153:1280, 385:512) (1153:1280, 513:640) (1153:1280, 625921:626048) (1153:1280, 626049:626176) (1153:1280, 626177:626304) (1153:1280, 626305:626400)
(1281:1408, 1:128) (1281:1408, 129:256) (1281:1408, 257:384) (1281:1408, 385:512) (1281:1408, 513:640) … (1281:1408, 625921:626048) (1281:1408, 626049:626176) (1281:1408, 626177:626304) (1281:1408, 626305:626400)
(1409:1536, 1:128) (1409:1536, 129:256) (1409:1536, 257:384) (1409:1536, 385:512) (1409:1536, 513:640) (1409:1536, 625921:626048) (1409:1536, 626049:626176) (1409:1536, 626177:626304) (1409:1536, 626305:626400)
(1537:1664, 1:128) (1537:1664, 129:256) (1537:1664, 257:384) (1537:1664, 385:512) (1537:1664, 513:640) (1537:1664, 625921:626048) (1537:1664, 626049:626176) (1537:1664, 626177:626304) (1537:1664, 626305:626400)
(1665:1792, 1:128) (1665:1792, 129:256) (1665:1792, 257:384) (1665:1792, 385:512) (1665:1792, 513:640) (1665:1792, 625921:626048) (1665:1792, 626049:626176) (1665:1792, 626177:626304) (1665:1792, 626305:626400)
(1793:1920, 1:128) (1793:1920, 129:256) (1793:1920, 257:384) (1793:1920, 385:512) (1793:1920, 513:640) (1793:1920, 625921:626048) (1793:1920, 626049:626176) (1793:1920, 626177:626304) (1793:1920, 626305:626400)
(1921:2048, 1:128) (1921:2048, 129:256) (1921:2048, 257:384) (1921:2048, 385:512) (1921:2048, 513:640) … (1921:2048, 625921:626048) (1921:2048, 626049:626176) (1921:2048, 626177:626304) (1921:2048, 626305:626400)
(2049:2176, 1:128) (2049:2176, 129:256) (2049:2176, 257:384) (2049:2176, 385:512) (2049:2176, 513:640) (2049:2176, 625921:626048) (2049:2176, 626049:626176) (2049:2176, 626177:626304) (2049:2176, 626305:626400)
⋮ ⋱ ⋮
(522625:522752, 1:128) (522625:522752, 129:256) (522625:522752, 257:384) (522625:522752, 385:512) (522625:522752, 513:640) (522625:522752, 625921:626048) (522625:522752, 626049:626176) (522625:522752, 626177:626304) (522625:522752, 626305:626400)
(522753:522880, 1:128) (522753:522880, 129:256) (522753:522880, 257:384) (522753:522880, 385:512) (522753:522880, 513:640) (522753:522880, 625921:626048) (522753:522880, 626049:626176) (522753:522880, 626177:626304) (522753:522880, 626305:626400)
(522881:523008, 1:128) (522881:523008, 129:256) (522881:523008, 257:384) (522881:523008, 385:512) (522881:523008, 513:640) … (522881:523008, 625921:626048) (522881:523008, 626049:626176) (522881:523008, 626177:626304) (522881:523008, 626305:626400)
(523009:523136, 1:128) (523009:523136, 129:256) (523009:523136, 257:384) (523009:523136, 385:512) (523009:523136, 513:640) (523009:523136, 625921:626048) (523009:523136, 626049:626176) (523009:523136, 626177:626304) (523009:523136, 626305:626400)
(523137:523264, 1:128) (523137:523264, 129:256) (523137:523264, 257:384) (523137:523264, 385:512) (523137:523264, 513:640) (523137:523264, 625921:626048) (523137:523264, 626049:626176) (523137:523264, 626177:626304) (523137:523264, 626305:626400)
(523265:523392, 1:128) (523265:523392, 129:256) (523265:523392, 257:384) (523265:523392, 385:512) (523265:523392, 513:640) (523265:523392, 625921:626048) (523265:523392, 626049:626176) (523265:523392, 626177:626304) (523265:523392, 626305:626400)
(523393:523520, 1:128) (523393:523520, 129:256) (523393:523520, 257:384) (523393:523520, 385:512) (523393:523520, 513:640) (523393:523520, 625921:626048) (523393:523520, 626049:626176) (523393:523520, 626177:626304) (523393:523520, 626305:626400)
(523521:523648, 1:128) (523521:523648, 129:256) (523521:523648, 257:384) (523521:523648, 385:512) (523521:523648, 513:640) … (523521:523648, 625921:626048) (523521:523648, 626049:626176) (523521:523648, 626177:626304) (523521:523648, 626305:626400)
(523649:523776, 1:128) (523649:523776, 129:256) (523649:523776, 257:384) (523649:523776, 385:512) (523649:523776, 513:640) (523649:523776, 625921:626048) (523649:523776, 626049:626176) (523649:523776, 626177:626304) (523649:523776, 626305:626400)
(523777:523904, 1:128) (523777:523904, 129:256) (523777:523904, 257:384) (523777:523904, 385:512) (523777:523904, 513:640) (523777:523904, 625921:626048) (523777:523904, 626049:626176) (523777:523904, 626177:626304) (523777:523904, 626305:626400)
(523905:524032, 1:128) (523905:524032, 129:256) (523905:524032, 257:384) (523905:524032, 385:512) (523905:524032, 513:640) (523905:524032, 625921:626048) (523905:524032, 626049:626176) (523905:524032, 626177:626304) (523905:524032, 626305:626400)
(524033:524160, 1:128) (524033:524160, 129:256) (524033:524160, 257:384) (524033:524160, 385:512) (524033:524160, 513:640) (524033:524160, 625921:626048) (524033:524160, 626049:626176) (524033:524160, 626177:626304) (524033:524160, 626305:626400)
(524161:524288, 1:128) (524161:524288, 129:256) (524161:524288, 257:384) (524161:524288, 385:512) (524161:524288, 513:640) … (524161:524288, 625921:626048) (524161:524288, 626049:626176) (524161:524288, 626177:626304) (524161:524288, 626305:626400)
(524289:524416, 1:128) (524289:524416, 129:256) (524289:524416, 257:384) (524289:524416, 385:512) (524289:524416, 513:640) (524289:524416, 625921:626048) (524289:524416, 626049:626176) (524289:524416, 626177:626304) (524289:524416, 626305:626400)
(524417:524544, 1:128) (524417:524544, 129:256) (524417:524544, 257:384) (524417:524544, 385:512) (524417:524544, 513:640) (524417:524544, 625921:626048) (524417:524544, 626049:626176) (524417:524544, 626177:626304) (524417:524544, 626305:626400)
(524545:524554, 1:128) (524545:524554, 129:256) (524545:524554, 257:384) (524545:524554, 385:512) (524545:524554, 513:640) (524545:524554, 625921:626048) (524545:524554, 626049:626176) (524545:524554, 626177:626304) (524545:524554, 626305:626400) |
Chunk pattern of fn = "ANT_G0120_0000_v.tif"
dem = Raster(fn; lazy=true);
DiskArrays.eachchunk(dem)
179×144 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
(1:256, 1:256) (1:256, 257:512) (1:256, 513:768) (1:256, 769:1024) (1:256, 1025:1280) (1:256, 1281:1536) … (1:256, 35841:36096) (1:256, 36097:36352) (1:256, 36353:36608) (1:256, 36609:36784)
(257:512, 1:256) (257:512, 257:512) (257:512, 513:768) (257:512, 769:1024) (257:512, 1025:1280) (257:512, 1281:1536) (257:512, 35841:36096) (257:512, 36097:36352) (257:512, 36353:36608) (257:512, 36609:36784)
(513:768, 1:256) (513:768, 257:512) (513:768, 513:768) (513:768, 769:1024) (513:768, 1025:1280) (513:768, 1281:1536) (513:768, 35841:36096) (513:768, 36097:36352) (513:768, 36353:36608) (513:768, 36609:36784)
(769:1024, 1:256) (769:1024, 257:512) (769:1024, 513:768) (769:1024, 769:1024) (769:1024, 1025:1280) (769:1024, 1281:1536) (769:1024, 35841:36096) (769:1024, 36097:36352) (769:1024, 36353:36608) (769:1024, 36609:36784)
(1025:1280, 1:256) (1025:1280, 257:512) (1025:1280, 513:768) (1025:1280, 769:1024) (1025:1280, 1025:1280) (1025:1280, 1281:1536) (1025:1280, 35841:36096) (1025:1280, 36097:36352) (1025:1280, 36353:36608) (1025:1280, 36609:36784)
(1281:1536, 1:256) (1281:1536, 257:512) (1281:1536, 513:768) (1281:1536, 769:1024) (1281:1536, 1025:1280) (1281:1536, 1281:1536) … (1281:1536, 35841:36096) (1281:1536, 36097:36352) (1281:1536, 36353:36608) (1281:1536, 36609:36784)
(1537:1792, 1:256) (1537:1792, 257:512) (1537:1792, 513:768) (1537:1792, 769:1024) (1537:1792, 1025:1280) (1537:1792, 1281:1536) (1537:1792, 35841:36096) (1537:1792, 36097:36352) (1537:1792, 36353:36608) (1537:1792, 36609:36784)
(1793:2048, 1:256) (1793:2048, 257:512) (1793:2048, 513:768) (1793:2048, 769:1024) (1793:2048, 1025:1280) (1793:2048, 1281:1536) (1793:2048, 35841:36096) (1793:2048, 36097:36352) (1793:2048, 36353:36608) (1793:2048, 36609:36784)
(2049:2304, 1:256) (2049:2304, 257:512) (2049:2304, 513:768) (2049:2304, 769:1024) (2049:2304, 1025:1280) (2049:2304, 1281:1536) (2049:2304, 35841:36096) (2049:2304, 36097:36352) (2049:2304, 36353:36608) (2049:2304, 36609:36784)
(2305:2560, 1:256) (2305:2560, 257:512) (2305:2560, 513:768) (2305:2560, 769:1024) (2305:2560, 1025:1280) (2305:2560, 1281:1536) (2305:2560, 35841:36096) (2305:2560, 36097:36352) (2305:2560, 36353:36608) (2305:2560, 36609:36784)
(2561:2816, 1:256) (2561:2816, 257:512) (2561:2816, 513:768) (2561:2816, 769:1024) (2561:2816, 1025:1280) (2561:2816, 1281:1536) … (2561:2816, 35841:36096) (2561:2816, 36097:36352) (2561:2816, 36353:36608) (2561:2816, 36609:36784)
(2817:3072, 1:256) (2817:3072, 257:512) (2817:3072, 513:768) (2817:3072, 769:1024) (2817:3072, 1025:1280) (2817:3072, 1281:1536) (2817:3072, 35841:36096) (2817:3072, 36097:36352) (2817:3072, 36353:36608) (2817:3072, 36609:36784)
(3073:3328, 1:256) (3073:3328, 257:512) (3073:3328, 513:768) (3073:3328, 769:1024) (3073:3328, 1025:1280) (3073:3328, 1281:1536) (3073:3328, 35841:36096) (3073:3328, 36097:36352) (3073:3328, 36353:36608) (3073:3328, 36609:36784)
(3329:3584, 1:256) (3329:3584, 257:512) (3329:3584, 513:768) (3329:3584, 769:1024) (3329:3584, 1025:1280) (3329:3584, 1281:1536) (3329:3584, 35841:36096) (3329:3584, 36097:36352) (3329:3584, 36353:36608) (3329:3584, 36609:36784)
(3585:3840, 1:256) (3585:3840, 257:512) (3585:3840, 513:768) (3585:3840, 769:1024) (3585:3840, 1025:1280) (3585:3840, 1281:1536) (3585:3840, 35841:36096) (3585:3840, 36097:36352) (3585:3840, 36353:36608) (3585:3840, 36609:36784)
(3841:4096, 1:256) (3841:4096, 257:512) (3841:4096, 513:768) (3841:4096, 769:1024) (3841:4096, 1025:1280) (3841:4096, 1281:1536) … (3841:4096, 35841:36096) (3841:4096, 36097:36352) (3841:4096, 36353:36608) (3841:4096, 36609:36784)
(4097:4352, 1:256) (4097:4352, 257:512) (4097:4352, 513:768) (4097:4352, 769:1024) (4097:4352, 1025:1280) (4097:4352, 1281:1536) (4097:4352, 35841:36096) (4097:4352, 36097:36352) (4097:4352, 36353:36608) (4097:4352, 36609:36784)
⋮ ⋮ ⋱ ⋮
(41729:41984, 1:256) (41729:41984, 257:512) (41729:41984, 513:768) (41729:41984, 769:1024) (41729:41984, 1025:1280) (41729:41984, 1281:1536) (41729:41984, 35841:36096) (41729:41984, 36097:36352) (41729:41984, 36353:36608) (41729:41984, 36609:36784)
(41985:42240, 1:256) (41985:42240, 257:512) (41985:42240, 513:768) (41985:42240, 769:1024) (41985:42240, 1025:1280) (41985:42240, 1281:1536) (41985:42240, 35841:36096) (41985:42240, 36097:36352) (41985:42240, 36353:36608) (41985:42240, 36609:36784)
(42241:42496, 1:256) (42241:42496, 257:512) (42241:42496, 513:768) (42241:42496, 769:1024) (42241:42496, 1025:1280) (42241:42496, 1281:1536) … (42241:42496, 35841:36096) (42241:42496, 36097:36352) (42241:42496, 36353:36608) (42241:42496, 36609:36784)
(42497:42752, 1:256) (42497:42752, 257:512) (42497:42752, 513:768) (42497:42752, 769:1024) (42497:42752, 1025:1280) (42497:42752, 1281:1536) (42497:42752, 35841:36096) (42497:42752, 36097:36352) (42497:42752, 36353:36608) (42497:42752, 36609:36784)
(42753:43008, 1:256) (42753:43008, 257:512) (42753:43008, 513:768) (42753:43008, 769:1024) (42753:43008, 1025:1280) (42753:43008, 1281:1536) (42753:43008, 35841:36096) (42753:43008, 36097:36352) (42753:43008, 36353:36608) (42753:43008, 36609:36784)
(43009:43264, 1:256) (43009:43264, 257:512) (43009:43264, 513:768) (43009:43264, 769:1024) (43009:43264, 1025:1280) (43009:43264, 1281:1536) (43009:43264, 35841:36096) (43009:43264, 36097:36352) (43009:43264, 36353:36608) (43009:43264, 36609:36784)
(43265:43520, 1:256) (43265:43520, 257:512) (43265:43520, 513:768) (43265:43520, 769:1024) (43265:43520, 1025:1280) (43265:43520, 1281:1536) (43265:43520, 35841:36096) (43265:43520, 36097:36352) (43265:43520, 36353:36608) (43265:43520, 36609:36784)
(43521:43776, 1:256) (43521:43776, 257:512) (43521:43776, 513:768) (43521:43776, 769:1024) (43521:43776, 1025:1280) (43521:43776, 1281:1536) … (43521:43776, 35841:36096) (43521:43776, 36097:36352) (43521:43776, 36353:36608) (43521:43776, 36609:36784)
(43777:44032, 1:256) (43777:44032, 257:512) (43777:44032, 513:768) (43777:44032, 769:1024) (43777:44032, 1025:1280) (43777:44032, 1281:1536) (43777:44032, 35841:36096) (43777:44032, 36097:36352) (43777:44032, 36353:36608) (43777:44032, 36609:36784)
(44033:44288, 1:256) (44033:44288, 257:512) (44033:44288, 513:768) (44033:44288, 769:1024) (44033:44288, 1025:1280) (44033:44288, 1281:1536) (44033:44288, 35841:36096) (44033:44288, 36097:36352) (44033:44288, 36353:36608) (44033:44288, 36609:36784)
(44289:44544, 1:256) (44289:44544, 257:512) (44289:44544, 513:768) (44289:44544, 769:1024) (44289:44544, 1025:1280) (44289:44544, 1281:1536) (44289:44544, 35841:36096) (44289:44544, 36097:36352) (44289:44544, 36353:36608) (44289:44544, 36609:36784)
(44545:44800, 1:256) (44545:44800, 257:512) (44545:44800, 513:768) (44545:44800, 769:1024) (44545:44800, 1025:1280) (44545:44800, 1281:1536) (44545:44800, 35841:36096) (44545:44800, 36097:36352) (44545:44800, 36353:36608) (44545:44800, 36609:36784)
(44801:45056, 1:256) (44801:45056, 257:512) (44801:45056, 513:768) (44801:45056, 769:1024) (44801:45056, 1025:1280) (44801:45056, 1281:1536) … (44801:45056, 35841:36096) (44801:45056, 36097:36352) (44801:45056, 36353:36608) (44801:45056, 36609:36784)
(45057:45312, 1:256) (45057:45312, 257:512) (45057:45312, 513:768) (45057:45312, 769:1024) (45057:45312, 1025:1280) (45057:45312, 1281:1536) (45057:45312, 35841:36096) (45057:45312, 36097:36352) (45057:45312, 36353:36608) (45057:45312, 36609:36784)
(45313:45568, 1:256) (45313:45568, 257:512) (45313:45568, 513:768) (45313:45568, 769:1024) (45313:45568, 1025:1280) (45313:45568, 1281:1536) (45313:45568, 35841:36096) (45313:45568, 36097:36352) (45313:45568, 36353:36608) (45313:45568, 36609:36784)
(45569:45792, 1:256) (45569:45792, 257:512) (45569:45792, 513:768) (45569:45792, 769:1024) (45569:45792, 1025:1280) (45569:45792, 1281:1536) (45569:45792, 35841:36096) (45569:45792, 36097:36352) (45569:45792, 36353:36608) (45569:45792, 36609:36784) |
The VRT has small chunks and like 100x more of them all up. 128x128 is probably a design mistake... What'ss the eltype? But I imagine this is mostly just DiskArrays.jl driving whether Rasters is 8x faster or 4x slower than GeoArrays. It might be good to manually force these small chunks to be larger to reduce the number of reads. We could let users scale the chunk aggregation to tune performance. There is heaps left to do with this. I want to standardise chunk keywords across all the backends now we have DiskArrays.jll everywhere. You can actually try that using rechunked = modify(lazyraster) do A
DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (256, 256)))
end or even larger chunks like 512*512 |
eltype = Float32 |
Thanks. Also your code is still not a MWE its using files I don't have - the idea is I just copy past and run it then spend my time fixing the bug ;) |
Unfortunately RechunkedDiskArray fails on read, I would assume that rechunking would require re-writing the data: fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
lazyraster = Raster(fn; lazy=true);
rechunked = modify(lazyraster) do A
DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end
524554×626400 Raster{Float32,2} with dimensions:
X Projected{Float64} LinRange{Float64}(-180.001, 179.999, 524554) ForwardOrdered Regular Intervals{Start} crs: WellKnownText,
Y Projected{Float64} LinRange{Float64}(83.9999, -89.9999, 626400) ReverseOrdered Regular Intervals{Start} crs: WellKnownText
and reference dimensions:
Band Categorical{Int64} 1:1 ForwardOrdered
extent: Extent(X = (-180.0013888888889, 180.00009843514712), Y = (-89.99986111113607, 84.00013888888888))
crs: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
lazycrop = Rasters.crop(rechunked, to=ext)
2913×7199 Raster{Float32,2} with dimensions:
X Projected{Float64} LinRange{Float64}(166.0, 167.999, 2913) ForwardOrdered Regular Intervals{Start} crs: WellKnownText,
Y Projected{Float64} LinRange{Float64}(-78.0004, -79.9999, 7199) ReverseOrdered Regular Intervals{Start} crs: WellKnownText
and reference dimensions:
Band Categorical{Int64} 1:1 ForwardOrdered
extent: Extent(X = (166.00026173592644, 167.9994540642185), Y = (-79.99986111113463, -78.00013888891213))
crs: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]
read(lazycrop)
ERROR: GDALError (CE_Failure, code 10):
Pointer 'hDS' is NULL in 'GDALDatasetRasterIOEx'.
Stacktrace:
[1] maybe_throw
@ ~/.julia/packages/GDAL/hNyuz/src/error.jl:42 [inlined]
[2] aftercare
@ ~/.julia/packages/GDAL/hNyuz/src/error.jl:59 [inlined]
[3] gdaldatasetrasterioex(hDS::ArchGDAL.Dataset, eRWFlag::ArchGDAL.GDALRWFlag, nDSXOff::Int64, nDSYOff::Int64, nDSXSize::Int64, nDSYSize::Int64, pBuffer::Ptr{…}, nBXSize::Int64, nBYSize::Int64, eBDataType::ArchGDAL.GDALDataType, nBandCount::Int64, panBandCount::Ptr{…}, nPixelSpace::Int64, nLineSpace::Int64, nBandSpace::Int64, psExtraArg::Ptr{…})
@ GDAL ~/.julia/packages/GDAL/hNyuz/src/libgdal.jl:7533
[4] rasterio!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:149 [inlined]
[5] rasterio!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:144 [inlined]
[6] read!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:380 [inlined]
[7] readblock!(dataset::ArchGDAL.RasterDataset{…}, buffer::Array{…}, x::UnitRange{…}, y::UnitRange{…}, z::UnitRange{…})
@ ArchGDAL ~/.julia/packages/ArchGDAL/lgE4A/src/raster/array.jl:212
[8] readblock!
@ ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20 [inlined]
[9] _readblock_rechunked
@ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:24 [inlined]
[10] readblock!
@ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:15 [inlined]
[11] readblock!(::DiskArrays.SubDiskArray{…}, ::Matrix{…}, ::UnitRange{…}, ::UnitRange{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20
[12] getindex_disk(::DiskArrays.SubDiskArray{…}, ::UnitRange{…}, ::Vararg{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:40
[13] getindex
@ ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:211 [inlined]
[14] subsetarg
@ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:147 [inlined]
[15] #62
@ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41 [inlined]
[16] map
@ ./tuple.jl:291 [inlined]
[17] (::DiskArrays.var"#61#63"{Matrix{…}, Base.Broadcast.Broadcasted{…}})(cnow::Tuple{UnitRange{…}, UnitRange{…}})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41
[18] foreach(f::DiskArrays.var"#61#63"{Matrix{…}, Base.Broadcast.Broadcasted{…}}, itr::DiskArrays.GridChunks{2, Tuple{…}})
@ Base ./abstractarray.jl:3097
[19] copyto!(dest::Matrix{…}, bc::Base.Broadcast.Broadcasted{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:38
[20] materialize!
@ ./broadcast.jl:914 [inlined]
[21] materialize!
@ ./broadcast.jl:911 [inlined]
[22] _Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:51
[23] Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:5
[24] (::Rasters.var"#25#26"{…})(O::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:77
[25] open(f::Rasters.var"#25#26"{…}, A::Raster{…}; kw::@Kwargs{})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:137
[26] open(f::Function, A::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:133
[27] modify(f::Type{…}, A::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:76
[28] read(x::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/read.jl:9
[29] top-level scope
@ REPL[14]:1
Some type information was truncated. Use `show(err)` to see complete types.
|
As for a MWE, I'm working with a huge .vrt. The code that I provided above will download the HUGE dataset if you have aws CLI installed: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html Once AWS CLI is installed, at the terminal you then do:
then you can replace |
The rechunkng doesn't require rewriting, I also wouldn't suggest something that needed that, I wrote Likely the file does no exist from this error, or something similar:
I also didn't see the aws code in comments. The minimum fuss is to do: run(`aws s3 cp s3:https://copernicus-dem-30m/ /path/to/your/folder --recursive --no-sign-request`) So you know the script just works without headache or finding things, I'm very low on dev time these days. |
MWE: # build COP30 vrts
using ArchGDAL
# download all dems using AWS CLI:
path2folder = "."
run(`aws s3 cp s3:https://copernicus-dem-30m/ $path2folder --recursive --no-sign-request`)
# Coordinate Reference System: Horizontal WGS84-G1150 (EPSG 4326) (DGED & DTED format), Vertical EGM2008 (EPSG 3855)
suffix = ["WBM.tif", "DEM.tif", "DEM_int16.tif", "EDM.tif", "FLM.tif", "HEM.tif"];
# EDM.tif = Editing Mask: 8 Bit unsigned integer, GeoTIFF
# FLM.tif = Filling Mask: 8 Bit unsigned integer, GeoTIFF
# HEM.tif = Height Error Mask: 32 Bit floating point, GeoTIFF
# WBM.tif = Water Body Mask: 8 Bit unsigned integer, GeoTIFF
# this takes a long time as it maps 440972 files
items = [item for item in walkdir(path2folder)]
files = [[joinpath(root, file) for file in files] for (root, dirs, files) in items]
files = reduce(vcat,files)
# build single .vrt for all .tif files with suffix
for suff in suffix
out_vrt = joinpath(path2folder, replace(suff, ".tif" => ".vrt"))
ind = endswith.(files,suff)
in_tifs = files[ind]
in_tifs = ArchGDAL.read.(in_tifs)
ArchGDAL.gdalbuildvrt(in_tifs; dest=out_vrt) do vrt
ArchGDAL.write(vrt, out_vrt)
end
end |
Did you try rechunking the arrays as suggested above? It will mean less read calls. If it works I'll consider doing this automatically when chunks are too small. rechunked = modify(lazyraster) do A
DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end The download was pretty huge for my home connection and I havent got around to doing this from work. |
trying to read rechunked dataset throws an error : using GeoArrays
using Rasters
using Extents
using DiskArrays
fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
ext = Extent(X = (166.0, 168.0), Y = (-80.0, -78.0));
nt = (min_x = 166.0, min_y = -80.0, max_x = 168.0, max_y = -78.0);
@time ga = GeoArrays.read(fn; masked=false);
# 1.251426 seconds (83 allocations: 6.484 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
# 1.156682 seconds (24 allocations: 80.075 MiB)
@time dem = Raster(fn; lazy=true);
# 1.346145 seconds (53.04 k allocations: 9.666 MiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
# 3.945627 seconds (24.89 k allocations: 161.155 MiB, 2.31% gc time, 0.22% compilation time)
rechunked = modify(dem) do A
DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end
@time dem0 = read(Rasters.crop(rechunked, to=ext));
ERROR: GDALError (CE_Failure, code 10):
Pointer 'hDS' is NULL in 'GDALDatasetRasterIOEx'.
Stacktrace:
[1] maybe_throw
@ ~/.julia/packages/GDAL/hNyuz/src/error.jl:42 [inlined]
[2] aftercare
@ ~/.julia/packages/GDAL/hNyuz/src/error.jl:59 [inlined]
[3] gdaldatasetrasterioex(hDS::ArchGDAL.Dataset, eRWFlag::ArchGDAL.GDALRWFlag, nDSXOff::Int64, nDSYOff::Int64, nDSXSize::Int64, nDSYSize::Int64, pBuffer::Ptr{…}, nBXSize::Int64, nBYSize::Int64, eBDataType::ArchGDAL.GDALDataType, nBandCount::Int64, panBandCount::Ptr{…}, nPixelSpace::Int64, nLineSpace::Int64, nBandSpace::Int64, psExtraArg::Ptr{…})
@ GDAL ~/.julia/packages/GDAL/hNyuz/src/libgdal.jl:7533
[4] rasterio!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:149 [inlined]
[5] rasterio!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:144 [inlined]
[6] read!
@ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:380 [inlined]
[7] readblock!(dataset::ArchGDAL.RasterDataset{…}, buffer::Array{…}, x::UnitRange{…}, y::UnitRange{…}, z::UnitRange{…})
@ ArchGDAL ~/.julia/packages/ArchGDAL/lgE4A/src/raster/array.jl:212
[8] readblock!
@ ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20 [inlined]
[9] _readblock_rechunked
@ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:24 [inlined]
[10] readblock!
@ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:15 [inlined]
[11] readblock!(::DiskArrays.SubDiskArray{…}, ::Matrix{…}, ::UnitRange{…}, ::UnitRange{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20
[12] getindex_disk(::DiskArrays.SubDiskArray{…}, ::UnitRange{…}, ::Vararg{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:40
[13] getindex
@ ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:211 [inlined]
[14] subsetarg
@ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:147 [inlined]
[15] #62
@ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41 [inlined]
[16] map
@ ./tuple.jl:291 [inlined]
[17] (::DiskArrays.var"#61#63"{Matrix{…}, Base.Broadcast.Broadcasted{…}})(cnow::Tuple{UnitRange{…}, UnitRange{…}})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41
[18] foreach(f::DiskArrays.var"#61#63"{Matrix{…}, Base.Broadcast.Broadcasted{…}}, itr::DiskArrays.GridChunks{2, Tuple{…}})
@ Base ./abstractarray.jl:3097
[19] copyto!(dest::Matrix{…}, bc::Base.Broadcast.Broadcasted{…})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:38
[20] materialize!
@ ./broadcast.jl:914 [inlined]
[21] materialize!
@ ./broadcast.jl:911 [inlined]
[22] _Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:51
[23] Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
@ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:5
[24] (::Rasters.var"#25#26"{…})(O::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:77
[25] open(f::Rasters.var"#25#26"{…}, A::Raster{…}; kw::@Kwargs{})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:137
[26] open(f::Function, A::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:133
[27] modify(f::Type{…}, A::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:76
[28] read(x::Raster{…})
@ Rasters ~/.julia/packages/Rasters/nC6xb/src/read.jl:9
[29] macro expansion
@ ./timing.jl:279 [inlined]
[30] top-level scope
@ ~/Documents/GitHub/Altim/src/junk.jl:24
Some type information was truncated. Use `show(err)` to see complete types. |
FYI you should be able to stop the S3 download at any point and built the |
You need to use @time dem0 = read(Rasters.crop(rechunked, to=ext)); ( |
🤦 please see edit |
Ok weird that doesn't work. Normally that error means the file doesnt exist... |
We don't need vrt files and stuff, not even a download. Here's a mwe that shows a significant speed difference: using BenchmarkTools
using ArchGDAL
using Rasters
using GeoArrays
fn = "deleteme.tif"
x = X(1:4096)
y = Y(1:4096)
bs = 256 # block size
rs = 512 # read size
write(fn, Raster(rand(UInt8, x,y)); options=Dict("TILED"=>"YES", "BLOCKXSIZE"=>bs, "BLOCKYSIZE"=>bs), force=true)
ext = Rasters.Extents.Extent(X=(1, rs), Y=(1, rs))
nt = (min_x=1, min_y=1, max_x=rs, max_y=rs)
r = Raster(fn; lazy=true);
ga = GeoArrays.read(fn; masked=false);
@btime Raster($fn; lazy=true); # 267us
@btime read(Rasters.crop($r; to=$ext)); # 497us
@btime read($r[X(Between(ext.X...)), Y(Between(ext.Y...))]); # 189us
@btime GeoArrays.read($fn; masked=false); # 241us
@btime GeoArrays.crop($ga, $nt); # 25us
@profview for _ in 1:1000 read(Rasters.crop(r; to=ext)) end
@profview for _ in 1:10000 GeoArrays.crop(ga, nt) end All timings on a SSD. GeoArrays is a quite thin wrapper. Oh and keep in mind, that benchmarking reads is often unstable, as the OS caches files in RAM etc. |
Thanks for the really nice MWE :) One point here is that it is sometimes useful that c = Rasters.crop(r; to=ext);
@btime Rasters.crop($r; to=$ext); # 3.2us And the next is that this seems to be a problem with @btime read(c); # 485us
@btime c[:, :]; # 127us Somehow its 3x more time than just indexing! Some dispatch somewhere needs to be fixed, either in Rasters or more likely in DiskArrays.jl for In this case switching chunks to 512 seems to be 10% faster again. But there must still be something else going on. Next, the fact that rasters returns an updated Raster rather than a raw Array as GeoArrays does mean there will always be an overhead, but it shouldn't be very large. It also looks to be slightly unstable, and fixing that could improve athings nother 10%. Oh, and one final key thing is that Rasters has to open and close the file !! GeoArrays holds an open gdal C object. Rasters doesn't do this unless you use I have thought about adding a |
Ok this is 100% a DiskArrays.jl bug, will fix over there. |
Reading a subset of a .vrt is consistently 4x slower using Rasters than using GeoArrays
The text was updated successfully, but these errors were encountered: