Compute Si for multiple outputs in parallel #41

willu47 · 2015-03-09T15:22:54Z

It would be good to extend the existing Morris analysis code so that multiple results vectors could be computed from one call, with results passed as a numpy array, rather than just a vector.

At present, it is necessary to loop over each output you wish to compute the metrics for, calling the analysis procedure each time.

import SALib.analyze.morris
for results in array_of_results:
    Si.append(analyze(problem, X, results))

It would be preferable to do this:

import SALib.analyze.morris
Si = analyze(problem, X, array_of_results)

A parallel implementation would be equally desirable, and trivial, as each output can be computed independently of the others.

jdherman · 2015-03-09T16:48:32Z

Thanks Will! This could work for all of the analysis methods too, not just Morris. A couple of questions about what you're envisioning:

Would the analyze functions allow either vector or matrix inputs, or just matrix?
In the case of matrix inputs, would it just perform that same loop behind the scenes? I don't see any way to vectorize but maybe it's possible.
What format would the returned data structure Si take? Right now it's a dictionary .... would it instead be a list of dictionaries? Or maybe each entry in the dictionary could contain a list/vector of values, one for each column in order.

Another thing, right now the analyze functions allow optional printing to the console. Could this still work for analyzing multiple outputs at the same time? Just a few thoughts ... this is a good idea, we just have to make sure the output is still easy to work with. And it would have to be implemented across all of the methods.

willu47 · 2015-03-09T17:00:23Z

Thanks for the comments - it's really handy to get some feedback on my ideas, and I'm sure that ends up with better code too. To respond point-by-point:

It should allow both vector and matrix - so there would need to be a check on dimensions of the results argument
Yep, it would potentially just perform a loop behind the scenes, but at least then there's the option to parallelise the computation in the future
I think a list of dictionaries would be best to maintain compatibility with the existing functionality.
Regarding printing to the console, this would quickly become impractical for large numbers of results (I am currently working with a model and examining 900+ outputs). When dealing with this much data, printing to the console is impossible, and a process of data aggregation goes on anyway, post SALib. So I suggest suppressing console output when more than one output vector is computed.

willu47 · 2015-03-09T17:37:13Z

Regarding Morris, in terms of vectorisation, then the computation of the metrics are easily converted to numpy computations over an array rather than a vector by changing the axis argument:

    Si['mu'] = np.average(ee, axis=1)
    Si['mu_star'] = np.average(np.abs(ee), axis=1)
    Si['sigma'] = np.std(ee, axis=1)

The computation of elementary effects is a little trickier though and would require more substantial work.

def compute_elementary_effects(model_inputs, model_outputs, trajectory_size, delta):
    '''
    Arguments:
        - model_inputs - matrix of inputs to the model under analysis.
                         x-by-r where x is the number of variables and
                         r is the number of rows (a function of x and num_trajectories)
        - model_outputs - an r-length vector of model outputs
        - trajectory_size - a scalar indicating the number of rows in a
                            trajectory
    '''
    num_vars = model_inputs.shape[1]
    num_rows = model_inputs.shape[0]
    num_trajectories = int(num_rows / trajectory_size)

    ee = np.zeros((num_trajectories, num_vars), dtype=np.float)

    ip_vec = model_inputs.reshape(num_trajectories,trajectory_size,num_vars)
    ip_cha = np.subtract(ip_vec[:,1:,:], ip_vec[:,0:-1,:])
    up = (ip_cha > 0)
    lo = (ip_cha < 0)

    op_vec = model_outputs.reshape(num_trajectories,trajectory_size)

    result_up = get_increased_values(op_vec, up, lo)
    result_lo = get_decreased_values(op_vec, up, lo)

    ee = np.subtract(result_up, result_lo)
    np.divide(ee, delta, out = ee)

    return ee

grantstephens · 2016-03-14T09:07:28Z

Hi Guys

Coming to the party a bit later here, but just started using this library- great work so far- thank you. Just wondering if this has been implemented yet for any of the methods or is it still a manual job?
Don't know how complex the results could get but a dataframe might be an option at some point if the dictionary gets a bit cumbersome.

Cheers

jdherman · 2016-03-19T04:05:53Z

Hi @RexFuzzle thanks for using the library.

As far as I know this hasn't been implemented yet. Will might have something in the Morris method, but that's it. There's also a thread-based parallelization for Sobol, but this still only calculates a single index at a time.

There's definitely room for improvement here -- are we talking about vectorized calculations, or just some kind of parallelization? I agree a different data structure could help but that would be a pretty serious renovation under the hood.

I haven't had as much time as I'd like to contribute to this lately, but am certainly open to any suggestions!

jdherman · 2019-11-07T22:24:24Z

This is an open issue for all methods. We want to be able to pass in a matrix of model outputs and have all of the Si values returned somehow.

Ideally the calculation of Si values would be vectorized, but this may not be possible for all methods. There could also be an option to parallelize, because the outputs are all separate.

Right now there is only (shared memory) parallelization for Sobol, but it's parallelized across the parameters, not outputs. In my experience it doesn't add much speedup. I would be in favor of replacing this with a consistent approach across all methods that parallelizes over the outputs (columns of a matrix Y).

ConnectedSystems · 2021-09-04T05:40:40Z

This is partially addressed with the OO-based interface, which estimates Si on a per-column basis.

We could leave the procedural style unchanged as it offers fine-grain control. Backporting it to the procedural approach would take a lot of work but I'm open to it if needed.

from SALib.test_functions import lake_problem

# Create the SALib Problem specification
sp = ProblemSpec({
	'names': ['a', 'q', 'b', 'mean', 'stdev', 'delta', 'alpha'],
	'bounds': [[0.0, 0.1],
			   [2.0, 4.5],
			   [0.1, 0.45],
			   [0.01, 0.05],
			   [0.001, 0.005],
			   [0.93, 0.99],
			   [0.2, 0.5]],
	'outputs': ['max_P', 'Utility', 'Inertia', 'Reliability']
})

# Parallel example (note the use of `nprocs`)
(sp.sample_saltelli(2**8)
       .evaluate_parallel(lake_problem.evaluate, nprocs=2)
       .analyze_sobol(calc_second_order=True, conf_level=0.95, nprocs=2, seed=101))

A more procedural approach without method-chaining:

sp.sample_saltelli(2**8)
sp.evaluate_parallel(lake_problem.evaluate, nprocs=2)
sp.analyze_sobol(calc_second_order=True, conf_level=0.95, nprocs=2, seed=101)

judemoh · 2023-12-07T02:39:17Z

Hi there,

I am wondering if this has been expanded on since the comment by @ConnectedSystems.

I don't know how to write a custom function equivalent to .evaluate for the test functions for my very own function. Context:
I have tried using the evaluate_parallel() function with ProblemSpec but it states that it is still an experimental feature and may not work. Any updates on the matter would be really useful - thanks for such a cool package!

tupui · 2023-12-08T10:26:56Z

Hi @judemoh, nothing much changed so far. Though we have plans to overhaul the API. Hopefully we will have more good news to share around March next year that would allow us to work on that.

ConnectedSystems · 2023-12-08T12:27:16Z

Hi @judemoh

I have tried using the evaluate_parallel() function with ProblemSpec but it states that it is still an experimental feature and may not work.

I'm fairly confident that what is currently implemented should work, provided that:

the function to be assessed meet the expected requirements
your computer has enough memory to handle all the results and any intermediate data

The warning is there to manage expectations as I cannot test every possible use case - I know how I use SALib, but I don't know how others would use it, or what computer they use SALib on.

If you provide an example of your function I can help get something working, or at least tell you if it is possible.

For a quick overview, have a look at the documentation here:
https://salib.readthedocs.io/en/latest/user_guide/wrappers.html#parallel-evaluation-and-analysis

Happy to answer any questions!

willu47 added the enhancement label Mar 9, 2015

jdherman changed the title ~~Morris analyze - compute metrics for multiple results in parallel~~ Compute Si for multiple outputs in parallel Nov 7, 2019

jdherman added this to Features in SALib Development Roadmap Nov 7, 2019

jdherman moved this from Features to Methods in SALib Development Roadmap Nov 11, 2019

jdherman moved this from v1.5 onward to v2.0 in SALib Development Roadmap Nov 11, 2019

ConnectedSystems mentioned this issue Apr 29, 2020

Need for test function with multiple outputs #309

Closed

ConnectedSystems moved this from v2.0 to v1.5 onward in SALib Development Roadmap Oct 4, 2020

ConnectedSystems moved this from v1.5 onward to 1.4.x series in SALib Development Roadmap Jun 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute Si for multiple outputs in parallel #41

Compute Si for multiple outputs in parallel #41

willu47 commented Mar 9, 2015

jdherman commented Mar 9, 2015

willu47 commented Mar 9, 2015

willu47 commented Mar 9, 2015

grantstephens commented Mar 14, 2016

jdherman commented Mar 19, 2016

jdherman commented Nov 7, 2019

ConnectedSystems commented Sep 4, 2021 •

edited

Loading

judemoh commented Dec 7, 2023

tupui commented Dec 8, 2023

ConnectedSystems commented Dec 8, 2023 •

edited

Loading

Compute Si for multiple outputs in parallel #41

Compute Si for multiple outputs in parallel #41

Comments

willu47 commented Mar 9, 2015

jdherman commented Mar 9, 2015

willu47 commented Mar 9, 2015

willu47 commented Mar 9, 2015

grantstephens commented Mar 14, 2016

jdherman commented Mar 19, 2016

jdherman commented Nov 7, 2019

ConnectedSystems commented Sep 4, 2021 • edited Loading

judemoh commented Dec 7, 2023

tupui commented Dec 8, 2023

ConnectedSystems commented Dec 8, 2023 • edited Loading

ConnectedSystems commented Sep 4, 2021 •

edited

Loading

ConnectedSystems commented Dec 8, 2023 •

edited

Loading