is there a variant of `load!` that accumulates? #252

haakon-e · 2024-03-12T01:00:16Z

I was curious if there exists (or is even possible to implement) a variant of load! that accumulates the array instead of (over)writing?

e.g. if x=ones(3,3) and ds["var"][:, :] == ones(3,3), and you do load_acc!(variable(ds, "var"), x, :, :), you'd get x .== 2. I'm able to get around this by first writing data to a buffer, then accumulating x, but was just curious if there's a direct way of doing this... Could we quite useful for quickly accumulating statistics.

My current code looks something like this:

function load_accumulate!(file, data, var, buf = similar(data))
    NCDataset(file) do ds
        NCDatasets.load!(variable(ds, var), buf, :, :)
        data .+= buf
    end
end

# and is useful for operations like this:
function data_mean(files, data, var)
    buf = similar(data)
    for file in files
        load_accumulate!(file, data, var, buf)
    end
    data ./= length(files)
end

... Thinking about this a bit more, I suppose in principle that what I'm suggesting above can be generalized to handle any type of metric, like

function load_accumulate!(file, data, var, func::Function, buf = similar(data))
    NCDataset(file) do ds
        NCDatasets.load!(variable(ds, var), buf, :, :)
        data .+= func(buf)
    end
end

# e.g.:
load!("data.nc", data, "T", x -> x .^2)

but I don't actually know if any of that is possible to do without (secretly?) allocating a buffer. So maybe my local implementation is the way to go?

The text was updated successfully, but these errors were encountered:

Alexander-Barth · 2024-03-13T08:07:33Z

I don't think that this is possible without allocating an additional buffer as nc_get_var overwrite the buffet it gets.

For your information, there is some groupby + aggregation function defined here https://juliageo.org/CommonDataModel.jl/stable/tutorial1/#Grouping-and-reducing

haakon-e · 2024-03-20T22:55:27Z

Thank you! I am doing the looping because I aggregate data from many different files, and I've found the multifile-reading to be quite slow in some instances (but perhaps I'll try to file a separate issue on that if I can).

I'll try experimenting more with groupby+agg, which seems fast so far!

haakon-e closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is there a variant of `load!` that accumulates? #252

is there a variant of `load!` that accumulates? #252

haakon-e commented Mar 12, 2024

Alexander-Barth commented Mar 13, 2024

haakon-e commented Mar 20, 2024

is there a variant of load! that accumulates? #252

is there a variant of load! that accumulates? #252

Comments

haakon-e commented Mar 12, 2024

Alexander-Barth commented Mar 13, 2024

haakon-e commented Mar 20, 2024

is there a variant of `load!` that accumulates? #252

is there a variant of `load!` that accumulates? #252