Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading cropped array with Rasters is 4x slower than with GeoArrays #618

Closed
alex-s-gardner opened this issue Apr 1, 2024 · 30 comments
Closed
Labels
bug Something isn't working

Comments

@alex-s-gardner
Copy link
Contributor

alex-s-gardner commented Apr 1, 2024

Reading a subset of a .vrt is consistently 4x slower using Rasters than using GeoArrays

using GeoArrays
fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
ext = Extent(X = (166.0, 168.0), Y = (-80.0, -78.0));
nt = (min_x = 166.0, min_y = -80.0, max_x = 168.0, max_y = -78.0);

@time ga = GeoArrays.read(fn; masked=false);
1.389790 seconds (79 allocations: 6.312 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
1.043905 seconds (27 allocations: 80.075 MiB)

using Rasters
@time dem = Raster(fn; lazy=true);
1.459176 seconds (53.04 k allocations: 9.666 MiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
3.888810 seconds (24.72 k allocations: 161.149 MiB)
@rafaqz
Copy link
Owner

rafaqz commented Apr 1, 2024

Is this on main ?

And is that the first or second run? Rasters.jl will be slower first run as the object is more complicated to compile. That many allocations really looks like compilation, there is usually only a hundred or so on the second run.

Raster holds an unopened FileArray and closes the file to avoid a whole bunch of issues when you have hundreds of rasters. Then opens it when you need it or use open. GeoArray just wraps an open ArchGDAL.RasterDataset. That's a lot easier to compile. Rasters.jl could do some more precompilation again if its annoying.

Another thing may be that Rasters.jl is reading chunked from disk through a lazy view that crop makes, while GeoArrays is just reading directly in crop. It shouldn't make any difference but it could hit a slower path in DiskArrays.jl, or something.

You will need to share that file for me to benchmark it - if you can just add a Downloads.download(url, fn) line below fn that would help a lot.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 1, 2024

This is using Rasters v0.10.1

This was benched after several runs, so it's not a compilation issue.

I suspect it's related to how Rasters is reading chunked from disk

The file I'm using is a .vrt that points to local tiffs... and is a HUGE dataset so downloading is not an option. I will see if I can reproduce with a downloadable file.

@rafaqz
Copy link
Owner

rafaqz commented Apr 1, 2024

Ok with a huge lazy file you should use open:

dem = open(raster) do A
    dem = Raster(pathlocal.cop30_v2; lazy=true);
    read(Rasters.crop(dem, to=ext));
end

I need to document it better that open is the best syntax with lazy. A is an open RasterDataset object just like in GeoArrays, i but it closes itself safely for you when the closure is done.

But this can actually be fixed inside read to do the open for you... not sure why it isn't already.

@rafaqz rafaqz added the bug Something isn't working label Apr 1, 2024
@rafaqz
Copy link
Owner

rafaqz commented Apr 1, 2024

Ok I will fix this in read asap. I recently fixed read for stack with exactly the same problem... no idea why I forgot to do it for Raster

@alex-s-gardner
Copy link
Contributor Author

Raster doesn't seem to work on the IOStream from open:

foo = Raster(open(fn); lazy=true);
ERROR: MethodError: no method matching Raster(::IOStream; lazy::Bool)

Closest candidates are:
  Raster(::Any, ::Tuple; name, kw...)
   @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:231
  Raster(::Any, ::AbstractString; ...)
   @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:257
  Raster(::Any, ::AbstractString, ::Any; crs, mappedcrs, dims, refdims, name, metadata, missingval, source, write, lazy, dropband)
   @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:257
  ...

Stacktrace:
 [1] top-level scope
   @ REPL[68]:1

@rafaqz
Copy link
Owner

rafaqz commented Apr 1, 2024

No you call open on the raster! It has to open the file via a C binary like gdal, not a raw IO stream...

(I'm not recommending you things that don't work at all here ;)

But I did mess up the example above. This should work:

dem = open(Raster(pathlocal.cop30_v2; lazy=true)) do A
    read(Rasters.crop(dem, to=ext));
end

But I'm afraid it might not help with crop because read should open internally anyway already.

I will really need some kind of MWE to work this out...

@alex-s-gardner
Copy link
Contributor Author

Got it, I will work on a MWE

@alex-s-gardner
Copy link
Contributor Author

It seems I'm unable to replicate without a large .vrt file which means I am unable to create a MWE.

Testing on a large geotiff, Raster.jl outperforms GeoArrays.jl:

using Downloads
using Extents
using BenchmarkTools
using GeoArrays
using Rasters

url = "https://its-live-data.s3.amazonaws.com/velocity_mosaic/landsat/v00.0/static/cog/ANT_G0120_0000_v.tif"
fn = Downloads.download(url, "ANT_G0120_0000_v.tif")

ext = Extent(X = (0, 1000000), Y = (0, 1000000));
nt = (min_x = 0, min_y = 0, max_x = 10000000, max_y = 10000000);

@time ga = GeoArrays.read(fn; masked=false);
0.004583 seconds (685 allocations: 73.367 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
7.615868 seconds (27 allocations: 1.647 GiB, 0.52% gc time)



@time dem = Raster(fn; lazy=true);
0.002417 seconds (167 allocations: 10.484 KiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
1.551789 seconds (19.70 k allocations: 530.634 MiB)

@alex-s-gardner
Copy link
Contributor Author

comparing that with a large vrt:

fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";

ext = Extent(X = (166.0, 168.0), Y = (-80.0, -78.0));
nt = (min_x = 166.0, min_y = -80.0, max_x = 168.0, max_y = -78.0);

@time ga = GeoArrays.read(fn; masked=false);
1.315752 seconds (83 allocations: 6.484 KiB)
@time ga0 = GeoArrays.crop(ga, nt);
1.198129 seconds (24 allocations: 80.075 MiB)



@time dem = Raster(fn; lazy=true);
1.398906 seconds (53.04 k allocations: 9.666 MiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
4.080412 seconds (24.72 k allocations: 161.149 MiB)

@rafaqz
Copy link
Owner

rafaqz commented Apr 2, 2024

Can you get the chunk patterns of both the tif and vrt for comparison ? DiskArrays.eachchunk(raster).

You can also post some code that creates a large .vrt.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 2, 2024

Code to build a large .vrt:

# build COP30 vrts
using ArchGDAL

# info page
# https://spacedata.copernicus.eu/web/guest/collections/copernicus-digital-elevation-model?p_l_back_url=%2Fweb%2Fguest%2Fsearch%3Fq%3DDEM

# 2021 release available on aws:
# https://registry.opendata.aws/copernicus-dem/
# download all dems using AWS CLI:
# aws s3 cp s3:https://copernicus-dem-30m/ /mnt/devon-r2/shared_data/copernicus-dem-30m/ --recursive --no-sign-request
path2folder = "/mnt/devon-r2/shared_data/copernicus-dem-30m/"

# 2019 release was downloaded with ancillary files: /mnt/devon-r2/shared_data/COP-DEM_GLO-30-DGED_PUBLIC/
# path2folder = "/mnt/devon-r2/shared_data/COP-DEM_GLO-30-DGED_PUBLIC/";

# Coordinate Reference System: Horizontal WGS84-G1150 (EPSG 4326) (DGED & DTED format), Vertical EGM2008 (EPSG 3855)
suffix = ["WBM.tif", "DEM.tif", "DEM_int16.tif", "EDM.tif", "FLM.tif", "HEM.tif"];

# EDM.tif = Editing Mask: 8 Bit unsigned integer, GeoTIFF
# FLM.tif = Filling Mask: 8 Bit unsigned integer, GeoTIFF
# HEM.tif = Height Error Mask: 32 Bit floating point, GeoTIFF
# WBM.tif = Water Body Mask: 8 Bit unsigned integer, GeoTIFF

# this takes a long time as it maps 440972 files
items = [item for item in walkdir(path2folder)]
files = [[joinpath(root, file) for file in files] for (root, dirs, files) in items]
files = reduce(vcat,files)

# build single .vrt for all .tif files with suffix
for suff in suffix 
    out_vrt = joinpath(path2folder, replace(suff, ".tif" => ".vrt"))
    ind = endswith.(files,suff)
    in_tifs = files[ind]
    in_tifs = ArchGDAL.read.(in_tifs)
    ArchGDAL.gdalbuildvrt(in_tifs; dest=out_vrt) do vrt
        ArchGDAL.write(vrt, out_vrt)
    end
end

@alex-s-gardner
Copy link
Contributor Author

Chunk pattern of .vrt:

fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
dem = Raster(fn; lazy=true);
DiskArrays.eachchunk(dem)

4099×4894 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
 (1:128, 1:128)          (1:128, 129:256)          (1:128, 257:384)          (1:128, 385:512)          (1:128, 513:640)            (1:128, 625921:626048)          (1:128, 626049:626176)          (1:128, 626177:626304)          (1:128, 626305:626400)
 (129:256, 1:128)        (129:256, 129:256)        (129:256, 257:384)        (129:256, 385:512)        (129:256, 513:640)           (129:256, 625921:626048)        (129:256, 626049:626176)        (129:256, 626177:626304)        (129:256, 626305:626400)
 (257:384, 1:128)        (257:384, 129:256)        (257:384, 257:384)        (257:384, 385:512)        (257:384, 513:640)           (257:384, 625921:626048)        (257:384, 626049:626176)        (257:384, 626177:626304)        (257:384, 626305:626400)
 (385:512, 1:128)        (385:512, 129:256)        (385:512, 257:384)        (385:512, 385:512)        (385:512, 513:640)           (385:512, 625921:626048)        (385:512, 626049:626176)        (385:512, 626177:626304)        (385:512, 626305:626400)
 (513:640, 1:128)        (513:640, 129:256)        (513:640, 257:384)        (513:640, 385:512)        (513:640, 513:640)           (513:640, 625921:626048)        (513:640, 626049:626176)        (513:640, 626177:626304)        (513:640, 626305:626400)
 (641:768, 1:128)        (641:768, 129:256)        (641:768, 257:384)        (641:768, 385:512)        (641:768, 513:640)          (641:768, 625921:626048)        (641:768, 626049:626176)        (641:768, 626177:626304)        (641:768, 626305:626400)
 (769:896, 1:128)        (769:896, 129:256)        (769:896, 257:384)        (769:896, 385:512)        (769:896, 513:640)           (769:896, 625921:626048)        (769:896, 626049:626176)        (769:896, 626177:626304)        (769:896, 626305:626400)
 (897:1024, 1:128)       (897:1024, 129:256)       (897:1024, 257:384)       (897:1024, 385:512)       (897:1024, 513:640)          (897:1024, 625921:626048)       (897:1024, 626049:626176)       (897:1024, 626177:626304)       (897:1024, 626305:626400)
 (1025:1152, 1:128)      (1025:1152, 129:256)      (1025:1152, 257:384)      (1025:1152, 385:512)      (1025:1152, 513:640)         (1025:1152, 625921:626048)      (1025:1152, 626049:626176)      (1025:1152, 626177:626304)      (1025:1152, 626305:626400)
 (1153:1280, 1:128)      (1153:1280, 129:256)      (1153:1280, 257:384)      (1153:1280, 385:512)      (1153:1280, 513:640)         (1153:1280, 625921:626048)      (1153:1280, 626049:626176)      (1153:1280, 626177:626304)      (1153:1280, 626305:626400)
 (1281:1408, 1:128)      (1281:1408, 129:256)      (1281:1408, 257:384)      (1281:1408, 385:512)      (1281:1408, 513:640)        (1281:1408, 625921:626048)      (1281:1408, 626049:626176)      (1281:1408, 626177:626304)      (1281:1408, 626305:626400)
 (1409:1536, 1:128)      (1409:1536, 129:256)      (1409:1536, 257:384)      (1409:1536, 385:512)      (1409:1536, 513:640)         (1409:1536, 625921:626048)      (1409:1536, 626049:626176)      (1409:1536, 626177:626304)      (1409:1536, 626305:626400)
 (1537:1664, 1:128)      (1537:1664, 129:256)      (1537:1664, 257:384)      (1537:1664, 385:512)      (1537:1664, 513:640)         (1537:1664, 625921:626048)      (1537:1664, 626049:626176)      (1537:1664, 626177:626304)      (1537:1664, 626305:626400)
 (1665:1792, 1:128)      (1665:1792, 129:256)      (1665:1792, 257:384)      (1665:1792, 385:512)      (1665:1792, 513:640)         (1665:1792, 625921:626048)      (1665:1792, 626049:626176)      (1665:1792, 626177:626304)      (1665:1792, 626305:626400)
 (1793:1920, 1:128)      (1793:1920, 129:256)      (1793:1920, 257:384)      (1793:1920, 385:512)      (1793:1920, 513:640)         (1793:1920, 625921:626048)      (1793:1920, 626049:626176)      (1793:1920, 626177:626304)      (1793:1920, 626305:626400)
 (1921:2048, 1:128)      (1921:2048, 129:256)      (1921:2048, 257:384)      (1921:2048, 385:512)      (1921:2048, 513:640)        (1921:2048, 625921:626048)      (1921:2048, 626049:626176)      (1921:2048, 626177:626304)      (1921:2048, 626305:626400)
 (2049:2176, 1:128)      (2049:2176, 129:256)      (2049:2176, 257:384)      (2049:2176, 385:512)      (2049:2176, 513:640)         (2049:2176, 625921:626048)      (2049:2176, 626049:626176)      (2049:2176, 626177:626304)      (2049:2176, 626305:626400)
                                                                                                                                                                                                                                 
 (522625:522752, 1:128)  (522625:522752, 129:256)  (522625:522752, 257:384)  (522625:522752, 385:512)  (522625:522752, 513:640)     (522625:522752, 625921:626048)  (522625:522752, 626049:626176)  (522625:522752, 626177:626304)  (522625:522752, 626305:626400)
 (522753:522880, 1:128)  (522753:522880, 129:256)  (522753:522880, 257:384)  (522753:522880, 385:512)  (522753:522880, 513:640)     (522753:522880, 625921:626048)  (522753:522880, 626049:626176)  (522753:522880, 626177:626304)  (522753:522880, 626305:626400)
 (522881:523008, 1:128)  (522881:523008, 129:256)  (522881:523008, 257:384)  (522881:523008, 385:512)  (522881:523008, 513:640)    (522881:523008, 625921:626048)  (522881:523008, 626049:626176)  (522881:523008, 626177:626304)  (522881:523008, 626305:626400)
 (523009:523136, 1:128)  (523009:523136, 129:256)  (523009:523136, 257:384)  (523009:523136, 385:512)  (523009:523136, 513:640)     (523009:523136, 625921:626048)  (523009:523136, 626049:626176)  (523009:523136, 626177:626304)  (523009:523136, 626305:626400)
 (523137:523264, 1:128)  (523137:523264, 129:256)  (523137:523264, 257:384)  (523137:523264, 385:512)  (523137:523264, 513:640)     (523137:523264, 625921:626048)  (523137:523264, 626049:626176)  (523137:523264, 626177:626304)  (523137:523264, 626305:626400)
 (523265:523392, 1:128)  (523265:523392, 129:256)  (523265:523392, 257:384)  (523265:523392, 385:512)  (523265:523392, 513:640)     (523265:523392, 625921:626048)  (523265:523392, 626049:626176)  (523265:523392, 626177:626304)  (523265:523392, 626305:626400)
 (523393:523520, 1:128)  (523393:523520, 129:256)  (523393:523520, 257:384)  (523393:523520, 385:512)  (523393:523520, 513:640)     (523393:523520, 625921:626048)  (523393:523520, 626049:626176)  (523393:523520, 626177:626304)  (523393:523520, 626305:626400)
 (523521:523648, 1:128)  (523521:523648, 129:256)  (523521:523648, 257:384)  (523521:523648, 385:512)  (523521:523648, 513:640)    (523521:523648, 625921:626048)  (523521:523648, 626049:626176)  (523521:523648, 626177:626304)  (523521:523648, 626305:626400)
 (523649:523776, 1:128)  (523649:523776, 129:256)  (523649:523776, 257:384)  (523649:523776, 385:512)  (523649:523776, 513:640)     (523649:523776, 625921:626048)  (523649:523776, 626049:626176)  (523649:523776, 626177:626304)  (523649:523776, 626305:626400)
 (523777:523904, 1:128)  (523777:523904, 129:256)  (523777:523904, 257:384)  (523777:523904, 385:512)  (523777:523904, 513:640)     (523777:523904, 625921:626048)  (523777:523904, 626049:626176)  (523777:523904, 626177:626304)  (523777:523904, 626305:626400)
 (523905:524032, 1:128)  (523905:524032, 129:256)  (523905:524032, 257:384)  (523905:524032, 385:512)  (523905:524032, 513:640)     (523905:524032, 625921:626048)  (523905:524032, 626049:626176)  (523905:524032, 626177:626304)  (523905:524032, 626305:626400)
 (524033:524160, 1:128)  (524033:524160, 129:256)  (524033:524160, 257:384)  (524033:524160, 385:512)  (524033:524160, 513:640)     (524033:524160, 625921:626048)  (524033:524160, 626049:626176)  (524033:524160, 626177:626304)  (524033:524160, 626305:626400)
 (524161:524288, 1:128)  (524161:524288, 129:256)  (524161:524288, 257:384)  (524161:524288, 385:512)  (524161:524288, 513:640)    (524161:524288, 625921:626048)  (524161:524288, 626049:626176)  (524161:524288, 626177:626304)  (524161:524288, 626305:626400)
 (524289:524416, 1:128)  (524289:524416, 129:256)  (524289:524416, 257:384)  (524289:524416, 385:512)  (524289:524416, 513:640)     (524289:524416, 625921:626048)  (524289:524416, 626049:626176)  (524289:524416, 626177:626304)  (524289:524416, 626305:626400)
 (524417:524544, 1:128)  (524417:524544, 129:256)  (524417:524544, 257:384)  (524417:524544, 385:512)  (524417:524544, 513:640)     (524417:524544, 625921:626048)  (524417:524544, 626049:626176)  (524417:524544, 626177:626304)  (524417:524544, 626305:626400)
 (524545:524554, 1:128)  (524545:524554, 129:256)  (524545:524554, 257:384)  (524545:524554, 385:512)  (524545:524554, 513:640)     (524545:524554, 625921:626048)  (524545:524554, 626049:626176)  (524545:524554, 626177:626304)  (524545:524554, 626305:626400)

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 2, 2024

Chunk pattern of .tiff:

fn = "ANT_G0120_0000_v.tif"
dem = Raster(fn; lazy=true);
DiskArrays.eachchunk(dem)

179×144 DiskArrays.GridChunks{2, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
 (1:256, 1:256)        (1:256, 257:512)        (1:256, 513:768)        (1:256, 769:1024)        (1:256, 1025:1280)        (1:256, 1281:1536)          (1:256, 35841:36096)        (1:256, 36097:36352)        (1:256, 36353:36608)        (1:256, 36609:36784)
 (257:512, 1:256)      (257:512, 257:512)      (257:512, 513:768)      (257:512, 769:1024)      (257:512, 1025:1280)      (257:512, 1281:1536)         (257:512, 35841:36096)      (257:512, 36097:36352)      (257:512, 36353:36608)      (257:512, 36609:36784)
 (513:768, 1:256)      (513:768, 257:512)      (513:768, 513:768)      (513:768, 769:1024)      (513:768, 1025:1280)      (513:768, 1281:1536)         (513:768, 35841:36096)      (513:768, 36097:36352)      (513:768, 36353:36608)      (513:768, 36609:36784)
 (769:1024, 1:256)     (769:1024, 257:512)     (769:1024, 513:768)     (769:1024, 769:1024)     (769:1024, 1025:1280)     (769:1024, 1281:1536)        (769:1024, 35841:36096)     (769:1024, 36097:36352)     (769:1024, 36353:36608)     (769:1024, 36609:36784)
 (1025:1280, 1:256)    (1025:1280, 257:512)    (1025:1280, 513:768)    (1025:1280, 769:1024)    (1025:1280, 1025:1280)    (1025:1280, 1281:1536)       (1025:1280, 35841:36096)    (1025:1280, 36097:36352)    (1025:1280, 36353:36608)    (1025:1280, 36609:36784)
 (1281:1536, 1:256)    (1281:1536, 257:512)    (1281:1536, 513:768)    (1281:1536, 769:1024)    (1281:1536, 1025:1280)    (1281:1536, 1281:1536)      (1281:1536, 35841:36096)    (1281:1536, 36097:36352)    (1281:1536, 36353:36608)    (1281:1536, 36609:36784)
 (1537:1792, 1:256)    (1537:1792, 257:512)    (1537:1792, 513:768)    (1537:1792, 769:1024)    (1537:1792, 1025:1280)    (1537:1792, 1281:1536)       (1537:1792, 35841:36096)    (1537:1792, 36097:36352)    (1537:1792, 36353:36608)    (1537:1792, 36609:36784)
 (1793:2048, 1:256)    (1793:2048, 257:512)    (1793:2048, 513:768)    (1793:2048, 769:1024)    (1793:2048, 1025:1280)    (1793:2048, 1281:1536)       (1793:2048, 35841:36096)    (1793:2048, 36097:36352)    (1793:2048, 36353:36608)    (1793:2048, 36609:36784)
 (2049:2304, 1:256)    (2049:2304, 257:512)    (2049:2304, 513:768)    (2049:2304, 769:1024)    (2049:2304, 1025:1280)    (2049:2304, 1281:1536)       (2049:2304, 35841:36096)    (2049:2304, 36097:36352)    (2049:2304, 36353:36608)    (2049:2304, 36609:36784)
 (2305:2560, 1:256)    (2305:2560, 257:512)    (2305:2560, 513:768)    (2305:2560, 769:1024)    (2305:2560, 1025:1280)    (2305:2560, 1281:1536)       (2305:2560, 35841:36096)    (2305:2560, 36097:36352)    (2305:2560, 36353:36608)    (2305:2560, 36609:36784)
 (2561:2816, 1:256)    (2561:2816, 257:512)    (2561:2816, 513:768)    (2561:2816, 769:1024)    (2561:2816, 1025:1280)    (2561:2816, 1281:1536)      (2561:2816, 35841:36096)    (2561:2816, 36097:36352)    (2561:2816, 36353:36608)    (2561:2816, 36609:36784)
 (2817:3072, 1:256)    (2817:3072, 257:512)    (2817:3072, 513:768)    (2817:3072, 769:1024)    (2817:3072, 1025:1280)    (2817:3072, 1281:1536)       (2817:3072, 35841:36096)    (2817:3072, 36097:36352)    (2817:3072, 36353:36608)    (2817:3072, 36609:36784)
 (3073:3328, 1:256)    (3073:3328, 257:512)    (3073:3328, 513:768)    (3073:3328, 769:1024)    (3073:3328, 1025:1280)    (3073:3328, 1281:1536)       (3073:3328, 35841:36096)    (3073:3328, 36097:36352)    (3073:3328, 36353:36608)    (3073:3328, 36609:36784)
 (3329:3584, 1:256)    (3329:3584, 257:512)    (3329:3584, 513:768)    (3329:3584, 769:1024)    (3329:3584, 1025:1280)    (3329:3584, 1281:1536)       (3329:3584, 35841:36096)    (3329:3584, 36097:36352)    (3329:3584, 36353:36608)    (3329:3584, 36609:36784)
 (3585:3840, 1:256)    (3585:3840, 257:512)    (3585:3840, 513:768)    (3585:3840, 769:1024)    (3585:3840, 1025:1280)    (3585:3840, 1281:1536)       (3585:3840, 35841:36096)    (3585:3840, 36097:36352)    (3585:3840, 36353:36608)    (3585:3840, 36609:36784)
 (3841:4096, 1:256)    (3841:4096, 257:512)    (3841:4096, 513:768)    (3841:4096, 769:1024)    (3841:4096, 1025:1280)    (3841:4096, 1281:1536)      (3841:4096, 35841:36096)    (3841:4096, 36097:36352)    (3841:4096, 36353:36608)    (3841:4096, 36609:36784)
 (4097:4352, 1:256)    (4097:4352, 257:512)    (4097:4352, 513:768)    (4097:4352, 769:1024)    (4097:4352, 1025:1280)    (4097:4352, 1281:1536)       (4097:4352, 35841:36096)    (4097:4352, 36097:36352)    (4097:4352, 36353:36608)    (4097:4352, 36609:36784)
                                                                                                                                                                                                                                       
 (41729:41984, 1:256)  (41729:41984, 257:512)  (41729:41984, 513:768)  (41729:41984, 769:1024)  (41729:41984, 1025:1280)  (41729:41984, 1281:1536)     (41729:41984, 35841:36096)  (41729:41984, 36097:36352)  (41729:41984, 36353:36608)  (41729:41984, 36609:36784)
 (41985:42240, 1:256)  (41985:42240, 257:512)  (41985:42240, 513:768)  (41985:42240, 769:1024)  (41985:42240, 1025:1280)  (41985:42240, 1281:1536)     (41985:42240, 35841:36096)  (41985:42240, 36097:36352)  (41985:42240, 36353:36608)  (41985:42240, 36609:36784)
 (42241:42496, 1:256)  (42241:42496, 257:512)  (42241:42496, 513:768)  (42241:42496, 769:1024)  (42241:42496, 1025:1280)  (42241:42496, 1281:1536)    (42241:42496, 35841:36096)  (42241:42496, 36097:36352)  (42241:42496, 36353:36608)  (42241:42496, 36609:36784)
 (42497:42752, 1:256)  (42497:42752, 257:512)  (42497:42752, 513:768)  (42497:42752, 769:1024)  (42497:42752, 1025:1280)  (42497:42752, 1281:1536)     (42497:42752, 35841:36096)  (42497:42752, 36097:36352)  (42497:42752, 36353:36608)  (42497:42752, 36609:36784)
 (42753:43008, 1:256)  (42753:43008, 257:512)  (42753:43008, 513:768)  (42753:43008, 769:1024)  (42753:43008, 1025:1280)  (42753:43008, 1281:1536)     (42753:43008, 35841:36096)  (42753:43008, 36097:36352)  (42753:43008, 36353:36608)  (42753:43008, 36609:36784)
 (43009:43264, 1:256)  (43009:43264, 257:512)  (43009:43264, 513:768)  (43009:43264, 769:1024)  (43009:43264, 1025:1280)  (43009:43264, 1281:1536)     (43009:43264, 35841:36096)  (43009:43264, 36097:36352)  (43009:43264, 36353:36608)  (43009:43264, 36609:36784)
 (43265:43520, 1:256)  (43265:43520, 257:512)  (43265:43520, 513:768)  (43265:43520, 769:1024)  (43265:43520, 1025:1280)  (43265:43520, 1281:1536)     (43265:43520, 35841:36096)  (43265:43520, 36097:36352)  (43265:43520, 36353:36608)  (43265:43520, 36609:36784)
 (43521:43776, 1:256)  (43521:43776, 257:512)  (43521:43776, 513:768)  (43521:43776, 769:1024)  (43521:43776, 1025:1280)  (43521:43776, 1281:1536)    (43521:43776, 35841:36096)  (43521:43776, 36097:36352)  (43521:43776, 36353:36608)  (43521:43776, 36609:36784)
 (43777:44032, 1:256)  (43777:44032, 257:512)  (43777:44032, 513:768)  (43777:44032, 769:1024)  (43777:44032, 1025:1280)  (43777:44032, 1281:1536)     (43777:44032, 35841:36096)  (43777:44032, 36097:36352)  (43777:44032, 36353:36608)  (43777:44032, 36609:36784)
 (44033:44288, 1:256)  (44033:44288, 257:512)  (44033:44288, 513:768)  (44033:44288, 769:1024)  (44033:44288, 1025:1280)  (44033:44288, 1281:1536)     (44033:44288, 35841:36096)  (44033:44288, 36097:36352)  (44033:44288, 36353:36608)  (44033:44288, 36609:36784)
 (44289:44544, 1:256)  (44289:44544, 257:512)  (44289:44544, 513:768)  (44289:44544, 769:1024)  (44289:44544, 1025:1280)  (44289:44544, 1281:1536)     (44289:44544, 35841:36096)  (44289:44544, 36097:36352)  (44289:44544, 36353:36608)  (44289:44544, 36609:36784)
 (44545:44800, 1:256)  (44545:44800, 257:512)  (44545:44800, 513:768)  (44545:44800, 769:1024)  (44545:44800, 1025:1280)  (44545:44800, 1281:1536)     (44545:44800, 35841:36096)  (44545:44800, 36097:36352)  (44545:44800, 36353:36608)  (44545:44800, 36609:36784)
 (44801:45056, 1:256)  (44801:45056, 257:512)  (44801:45056, 513:768)  (44801:45056, 769:1024)  (44801:45056, 1025:1280)  (44801:45056, 1281:1536)    (44801:45056, 35841:36096)  (44801:45056, 36097:36352)  (44801:45056, 36353:36608)  (44801:45056, 36609:36784)
 (45057:45312, 1:256)  (45057:45312, 257:512)  (45057:45312, 513:768)  (45057:45312, 769:1024)  (45057:45312, 1025:1280)  (45057:45312, 1281:1536)     (45057:45312, 35841:36096)  (45057:45312, 36097:36352)  (45057:45312, 36353:36608)  (45057:45312, 36609:36784)
 (45313:45568, 1:256)  (45313:45568, 257:512)  (45313:45568, 513:768)  (45313:45568, 769:1024)  (45313:45568, 1025:1280)  (45313:45568, 1281:1536)     (45313:45568, 35841:36096)  (45313:45568, 36097:36352)  (45313:45568, 36353:36608)  (45313:45568, 36609:36784)
 (45569:45792, 1:256)  (45569:45792, 257:512)  (45569:45792, 513:768)  (45569:45792, 769:1024)  (45569:45792, 1025:1280)  (45569:45792, 1281:1536)     (45569:45792, 35841:36096)  (45569:45792, 36097:36352)  (45569:45792, 36353:36608)  (45569:45792, 36609:36784)

@rafaqz
Copy link
Owner

rafaqz commented Apr 2, 2024

The VRT has small chunks and like 100x more of them all up. 128x128 is probably a design mistake... What'ss the eltype?

But I imagine this is mostly just DiskArrays.jl driving whether Rasters is 8x faster or 4x slower than GeoArrays.

It might be good to manually force these small chunks to be larger to reduce the number of reads. We could let users scale the chunk aggregation to tune performance. There is heaps left to do with this. I want to standardise chunk keywords across all the backends now we have DiskArrays.jll everywhere.

You can actually try that using DiskArrays.RechunkedDiskArray:

rechunked = modify(lazyraster) do A
    DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (256, 256)))
end

or even larger chunks like 512*512

@alex-s-gardner
Copy link
Contributor Author

eltype = Float32

@rafaqz
Copy link
Owner

rafaqz commented Apr 2, 2024

Thanks. Also your code is still not a MWE its using files I don't have - the idea is I just copy past and run it then spend my time fixing the bug ;)

@alex-s-gardner
Copy link
Contributor Author

Unfortunately RechunkedDiskArray fails on read, I would assume that rechunking would require re-writing the data:

fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
lazyraster = Raster(fn; lazy=true);

rechunked = modify(lazyraster) do A
    DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end

524554×626400 Raster{Float32,2} with dimensions: 
  X Projected{Float64} LinRange{Float64}(-180.001, 179.999, 524554) ForwardOrdered Regular Intervals{Start} crs: WellKnownText,
  Y Projected{Float64} LinRange{Float64}(83.9999, -89.9999, 626400) ReverseOrdered Regular Intervals{Start} crs: WellKnownText
and reference dimensions: 
  Band Categorical{Int64} 1:1 ForwardOrdered
extent: Extent(X = (-180.0013888888889, 180.00009843514712), Y = (-89.99986111113607, 84.00013888888888))
crs: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]

lazycrop = Rasters.crop(rechunked, to=ext)
2913×7199 Raster{Float32,2} with dimensions: 
  X Projected{Float64} LinRange{Float64}(166.0, 167.999, 2913) ForwardOrdered Regular Intervals{Start} crs: WellKnownText,
  Y Projected{Float64} LinRange{Float64}(-78.0004, -79.9999, 7199) ReverseOrdered Regular Intervals{Start} crs: WellKnownText
and reference dimensions: 
  Band Categorical{Int64} 1:1 ForwardOrdered
extent: Extent(X = (166.00026173592644, 167.9994540642185), Y = (-79.99986111113463, -78.00013888891213))
crs: GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]]

read(lazycrop)
ERROR: GDALError (CE_Failure, code 10):
        Pointer 'hDS' is NULL in 'GDALDatasetRasterIOEx'.


Stacktrace:
  [1] maybe_throw
    @ ~/.julia/packages/GDAL/hNyuz/src/error.jl:42 [inlined]
  [2] aftercare
    @ ~/.julia/packages/GDAL/hNyuz/src/error.jl:59 [inlined]
  [3] gdaldatasetrasterioex(hDS::ArchGDAL.Dataset, eRWFlag::ArchGDAL.GDALRWFlag, nDSXOff::Int64, nDSYOff::Int64, nDSXSize::Int64, nDSYSize::Int64, pBuffer::Ptr{…}, nBXSize::Int64, nBYSize::Int64, eBDataType::ArchGDAL.GDALDataType, nBandCount::Int64, panBandCount::Ptr{…}, nPixelSpace::Int64, nLineSpace::Int64, nBandSpace::Int64, psExtraArg::Ptr{…})
    @ GDAL ~/.julia/packages/GDAL/hNyuz/src/libgdal.jl:7533
  [4] rasterio!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:149 [inlined]
  [5] rasterio!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:144 [inlined]
  [6] read!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:380 [inlined]
  [7] readblock!(dataset::ArchGDAL.RasterDataset{…}, buffer::Array{…}, x::UnitRange{…}, y::UnitRange{…}, z::UnitRange{…})
    @ ArchGDAL ~/.julia/packages/ArchGDAL/lgE4A/src/raster/array.jl:212
  [8] readblock!
    @ ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20 [inlined]
  [9] _readblock_rechunked
    @ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:24 [inlined]
 [10] readblock!
    @ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:15 [inlined]
 [11] readblock!(::DiskArrays.SubDiskArray{…}, ::Matrix{…}, ::UnitRange{…}, ::UnitRange{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20
 [12] getindex_disk(::DiskArrays.SubDiskArray{…}, ::UnitRange{…}, ::Vararg{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:40
 [13] getindex
    @ ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:211 [inlined]
 [14] subsetarg
    @ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:147 [inlined]
 [15] #62
    @ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41 [inlined]
 [16] map
    @ ./tuple.jl:291 [inlined]
 [17] (::DiskArrays.var"#61#63"{Matrix{}, Base.Broadcast.Broadcasted{}})(cnow::Tuple{UnitRange{…}, UnitRange{…}})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41
 [18] foreach(f::DiskArrays.var"#61#63"{Matrix{}, Base.Broadcast.Broadcasted{}}, itr::DiskArrays.GridChunks{2, Tuple{…}})
    @ Base ./abstractarray.jl:3097
 [19] copyto!(dest::Matrix{…}, bc::Base.Broadcast.Broadcasted{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:38
 [20] materialize!
    @ ./broadcast.jl:914 [inlined]
 [21] materialize!
    @ ./broadcast.jl:911 [inlined]
 [22] _Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:51
 [23] Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:5
 [24] (::Rasters.var"#25#26"{})(O::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:77
 [25] open(f::Rasters.var"#25#26"{}, A::Raster{…}; kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:137
 [26] open(f::Function, A::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:133
 [27] modify(f::Type{…}, A::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:76
 [28] read(x::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/read.jl:9
 [29] top-level scope
    @ REPL[14]:1
Some type information was truncated. Use `show(err)` to see complete types.

@alex-s-gardner
Copy link
Contributor Author

As for a MWE, I'm working with a huge .vrt. The code that I provided above will download the HUGE dataset if you have aws CLI installed: https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

Once AWS CLI is installed, at the terminal you then do:

aws s3 cp s3:https://copernicus-dem-30m/ /path/to/your/folder --recursive --no-sign-request

then you can replace path2folder with /path/to/your/folder in the scrip and it should build a .vrt to the local files

@rafaqz
Copy link
Owner

rafaqz commented Apr 2, 2024

The rechunkng doesn't require rewriting, I also wouldn't suggest something that needed that, I wrote RechunkedDiskArray ;)

Likely the file does no exist from this error, or something similar:

Pointer 'hDS' is NULL in 'GDALDatasetRasterIOEx

I also didn't see the aws code in comments. The minimum fuss is to do:

run(`aws s3 cp s3:https://copernicus-dem-30m/ /path/to/your/folder --recursive --no-sign-request`)

So you know the script just works without headache or finding things, I'm very low on dev time these days.

@rafaqz
Copy link
Owner

rafaqz commented Apr 2, 2024

MWE:

# build COP30 vrts
using ArchGDAL

# download all dems using AWS CLI:
path2folder = "."
run(`aws s3 cp s3:https://copernicus-dem-30m/ $path2folder --recursive --no-sign-request`)

# Coordinate Reference System: Horizontal WGS84-G1150 (EPSG 4326) (DGED & DTED format), Vertical EGM2008 (EPSG 3855)
suffix = ["WBM.tif", "DEM.tif", "DEM_int16.tif", "EDM.tif", "FLM.tif", "HEM.tif"];

# EDM.tif = Editing Mask: 8 Bit unsigned integer, GeoTIFF
# FLM.tif = Filling Mask: 8 Bit unsigned integer, GeoTIFF
# HEM.tif = Height Error Mask: 32 Bit floating point, GeoTIFF
# WBM.tif = Water Body Mask: 8 Bit unsigned integer, GeoTIFF

# this takes a long time as it maps 440972 files
items = [item for item in walkdir(path2folder)]
files = [[joinpath(root, file) for file in files] for (root, dirs, files) in items]
files = reduce(vcat,files)

# build single .vrt for all .tif files with suffix
for suff in suffix 
    out_vrt = joinpath(path2folder, replace(suff, ".tif" => ".vrt"))
    ind = endswith.(files,suff)
    in_tifs = files[ind]
    in_tifs = ArchGDAL.read.(in_tifs)
    ArchGDAL.gdalbuildvrt(in_tifs; dest=out_vrt) do vrt
        ArchGDAL.write(vrt, out_vrt)
    end
end

@rafaqz
Copy link
Owner

rafaqz commented Apr 14, 2024

Did you try rechunking the arrays as suggested above? It will mean less read calls. If it works I'll consider doing this automatically when chunks are too small.

rechunked = modify(lazyraster) do A
    DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end

The download was pretty huge for my home connection and I havent got around to doing this from work.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 17, 2024

trying to read rechunked dataset throws an error :

using GeoArrays
using Rasters
using Extents
using DiskArrays

fn = "/mnt/devon-r2/shared_data/copernicus-dem-30m/DEM.vrt";
ext = Extent(X = (166.0, 168.0), Y = (-80.0, -78.0));
nt = (min_x = 166.0, min_y = -80.0, max_x = 168.0, max_y = -78.0);

@time ga = GeoArrays.read(fn; masked=false);
# 1.251426 seconds (83 allocations: 6.484 KiB)

@time ga0 = GeoArrays.crop(ga, nt);
# 1.156682 seconds (24 allocations: 80.075 MiB)

@time dem = Raster(fn; lazy=true);
# 1.346145 seconds (53.04 k allocations: 9.666 MiB)
@time dem0 = read(Rasters.crop(dem, to=ext));
# 3.945627 seconds (24.89 k allocations: 161.155 MiB, 2.31% gc time, 0.22% compilation time)

rechunked = modify(dem) do A
    DiskArrays.RechunkedDiskArray(A, DiskArrays.GridChunks(A, (512, 512)))
end
@time dem0 = read(Rasters.crop(rechunked, to=ext));

ERROR: GDALError (CE_Failure, code 10):
        Pointer 'hDS' is NULL in 'GDALDatasetRasterIOEx'.


Stacktrace:
  [1] maybe_throw
    @ ~/.julia/packages/GDAL/hNyuz/src/error.jl:42 [inlined]
  [2] aftercare
    @ ~/.julia/packages/GDAL/hNyuz/src/error.jl:59 [inlined]
  [3] gdaldatasetrasterioex(hDS::ArchGDAL.Dataset, eRWFlag::ArchGDAL.GDALRWFlag, nDSXOff::Int64, nDSYOff::Int64, nDSXSize::Int64, nDSYSize::Int64, pBuffer::Ptr{…}, nBXSize::Int64, nBYSize::Int64, eBDataType::ArchGDAL.GDALDataType, nBandCount::Int64, panBandCount::Ptr{…}, nPixelSpace::Int64, nLineSpace::Int64, nBandSpace::Int64, psExtraArg::Ptr{…})
    @ GDAL ~/.julia/packages/GDAL/hNyuz/src/libgdal.jl:7533
  [4] rasterio!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:149 [inlined]
  [5] rasterio!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:144 [inlined]
  [6] read!
    @ ~/.julia/packages/ArchGDAL/lgE4A/src/raster/rasterio.jl:380 [inlined]
  [7] readblock!(dataset::ArchGDAL.RasterDataset{…}, buffer::Array{…}, x::UnitRange{…}, y::UnitRange{…}, z::UnitRange{…})
    @ ArchGDAL ~/.julia/packages/ArchGDAL/lgE4A/src/raster/array.jl:212
  [8] readblock!
    @ ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20 [inlined]
  [9] _readblock_rechunked
    @ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:24 [inlined]
 [10] readblock!
    @ ~/.julia/packages/DiskArrays/bZBJE/src/rechunk.jl:15 [inlined]
 [11] readblock!(::DiskArrays.SubDiskArray{…}, ::Matrix{…}, ::UnitRange{…}, ::UnitRange{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/subarray.jl:20
 [12] getindex_disk(::DiskArrays.SubDiskArray{…}, ::UnitRange{…}, ::Vararg{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:40
 [13] getindex
    @ ~/.julia/packages/DiskArrays/bZBJE/src/diskarray.jl:211 [inlined]
 [14] subsetarg
    @ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:147 [inlined]
 [15] #62
    @ ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41 [inlined]
 [16] map
    @ ./tuple.jl:291 [inlined]
 [17] (::DiskArrays.var"#61#63"{Matrix{}, Base.Broadcast.Broadcasted{}})(cnow::Tuple{UnitRange{…}, UnitRange{…}})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:41
 [18] foreach(f::DiskArrays.var"#61#63"{Matrix{}, Base.Broadcast.Broadcasted{}}, itr::DiskArrays.GridChunks{2, Tuple{…}})
    @ Base ./abstractarray.jl:3097
 [19] copyto!(dest::Matrix{…}, bc::Base.Broadcast.Broadcasted{…})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/broadcast.jl:38
 [20] materialize!
    @ ./broadcast.jl:914 [inlined]
 [21] materialize!
    @ ./broadcast.jl:911 [inlined]
 [22] _Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:51
 [23] Array(a::DiskArrays.SubDiskArray{Float32, 2, DiskArrays.RechunkedDiskArray{…}, Tuple{…}, false})
    @ DiskArrays ~/.julia/packages/DiskArrays/bZBJE/src/array.jl:5
 [24] (::Rasters.var"#25#26"{})(O::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:77
 [25] open(f::Rasters.var"#25#26"{}, A::Raster{…}; kw::@Kwargs{})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:137
 [26] open(f::Function, A::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:133
 [27] modify(f::Type{…}, A::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/array.jl:76
 [28] read(x::Raster{…})
    @ Rasters ~/.julia/packages/Rasters/nC6xb/src/read.jl:9
 [29] macro expansion
    @ ./timing.jl:279 [inlined]
 [30] top-level scope
    @ ~/Documents/GitHub/Altim/src/junk.jl:24
Some type information was truncated. Use `show(err)` to see complete types.

@alex-s-gardner
Copy link
Contributor Author

FYI you should be able to stop the S3 download at any point and built the .vrt with a subset of the full dataset. This should work for testing if you don't want to download the full archive.

@rafaqz
Copy link
Owner

rafaqz commented Apr 17, 2024

You need to use rechunked rather than dem to test it !

@time dem0 = read(Rasters.crop(rechunked, to=ext));

(dem is just the original object with the original chunks... there are no ! methods used and anyway nothing in Rasters in mutable except the array itself and the metadata dict if it exists.)

@alex-s-gardner
Copy link
Contributor Author

🤦 please see edit

@rafaqz
Copy link
Owner

rafaqz commented Apr 17, 2024

Ok weird that doesn't work. Normally that error means the file doesnt exist...

@maxfreu
Copy link
Contributor

maxfreu commented Apr 19, 2024

We don't need vrt files and stuff, not even a download. Here's a mwe that shows a significant speed difference:

using BenchmarkTools
using ArchGDAL
using Rasters
using GeoArrays

fn = "deleteme.tif"
x = X(1:4096)
y = Y(1:4096)
bs = 256 # block size
rs = 512 # read size
write(fn, Raster(rand(UInt8, x,y)); options=Dict("TILED"=>"YES", "BLOCKXSIZE"=>bs, "BLOCKYSIZE"=>bs), force=true)

ext = Rasters.Extents.Extent(X=(1, rs), Y=(1, rs))
nt = (min_x=1, min_y=1, max_x=rs, max_y=rs)

r = Raster(fn; lazy=true);
ga = GeoArrays.read(fn; masked=false);

@btime Raster($fn; lazy=true);  # 267us
@btime read(Rasters.crop($r; to=$ext));  # 497us
@btime read($r[X(Between(ext.X...)), Y(Between(ext.Y...))]);  # 189us

@btime GeoArrays.read($fn; masked=false);  # 241us
@btime GeoArrays.crop($ga, $nt);  # 25us

@profview for _ in 1:1000 read(Rasters.crop(r; to=ext)) end
@profview for _ in 1:10000 GeoArrays.crop(ga, nt) end

All timings on a SSD. GeoArrays is a quite thin wrapper. Oh and keep in mind, that benchmarking reads is often unstable, as the OS caches files in RAM etc.

@rafaqz
Copy link
Owner

rafaqz commented Apr 20, 2024

Thanks for the really nice MWE :)

One point here is that it is sometimes useful that crop takes relatively no time at all on its own (like when you don't use all the data or its just used as a template for something else like it is in zonal stats).

c = Rasters.crop(r; to=ext);
@btime Rasters.crop($r; to=$ext); # 3.2us

And the next is that this seems to be a problem with read

@btime read(c); # 485us
@btime c[:, :]; # 127us

Somehow its 3x more time than just indexing! Some dispatch somewhere needs to be fixed, either in Rasters or more likely in DiskArrays.jl for SubDiskArray.

In this case switching chunks to 512 seems to be 10% faster again. But there must still be something else going on.

Next, the fact that rasters returns an updated Raster rather than a raw Array as GeoArrays does mean there will always be an overhead, but it shouldn't be very large. It also looks to be slightly unstable, and fixing that could improve athings nother 10%.

Oh, and one final key thing is that Rasters has to open and close the file !!

GeoArrays holds an open gdal C object. Rasters doesn't do this unless you use open(x) do r and work inside the closure where the Raster holds the opened file. The reason for this is to avoid file system open file limits just breaking your session when you problems get to a large enough scale.

I have thought about adding a Raster(filename) do A constructor that takes a function as the first argument, so there is just one open/close and you work inside the closure.

@rafaqz
Copy link
Owner

rafaqz commented May 12, 2024

Ok this is 100% a DiskArrays.jl bug, will fix over there.

@rafaqz
Copy link
Owner

rafaqz commented May 12, 2024

@rafaqz rafaqz closed this as completed May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants