Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Si for multiple outputs in parallel #41

Open
willu47 opened this issue Mar 9, 2015 · 10 comments
Open

Compute Si for multiple outputs in parallel #41

willu47 opened this issue Mar 9, 2015 · 10 comments

Comments

@willu47
Copy link
Member

willu47 commented Mar 9, 2015

It would be good to extend the existing Morris analysis code so that multiple results vectors could be computed from one call, with results passed as a numpy array, rather than just a vector.

At present, it is necessary to loop over each output you wish to compute the metrics for, calling the analysis procedure each time.

import SALib.analyze.morris
for results in array_of_results:
    Si.append(analyze(problem, X, results))

It would be preferable to do this:

import SALib.analyze.morris
Si = analyze(problem, X, array_of_results)

A parallel implementation would be equally desirable, and trivial, as each output can be computed independently of the others.

@jdherman
Copy link
Member

jdherman commented Mar 9, 2015

Thanks Will! This could work for all of the analysis methods too, not just Morris. A couple of questions about what you're envisioning:

  • Would the analyze functions allow either vector or matrix inputs, or just matrix?
  • In the case of matrix inputs, would it just perform that same loop behind the scenes? I don't see any way to vectorize but maybe it's possible.
  • What format would the returned data structure Si take? Right now it's a dictionary .... would it instead be a list of dictionaries? Or maybe each entry in the dictionary could contain a list/vector of values, one for each column in order.

Another thing, right now the analyze functions allow optional printing to the console. Could this still work for analyzing multiple outputs at the same time? Just a few thoughts ... this is a good idea, we just have to make sure the output is still easy to work with. And it would have to be implemented across all of the methods.

@willu47
Copy link
Member Author

willu47 commented Mar 9, 2015

Thanks for the comments - it's really handy to get some feedback on my ideas, and I'm sure that ends up with better code too. To respond point-by-point:

  • It should allow both vector and matrix - so there would need to be a check on dimensions of the results argument
  • Yep, it would potentially just perform a loop behind the scenes, but at least then there's the option to parallelise the computation in the future
  • I think a list of dictionaries would be best to maintain compatibility with the existing functionality.
  • Regarding printing to the console, this would quickly become impractical for large numbers of results (I am currently working with a model and examining 900+ outputs). When dealing with this much data, printing to the console is impossible, and a process of data aggregation goes on anyway, post SALib. So I suggest suppressing console output when more than one output vector is computed.

@willu47
Copy link
Member Author

willu47 commented Mar 9, 2015

Regarding Morris, in terms of vectorisation, then the computation of the metrics are easily converted to numpy computations over an array rather than a vector by changing the axis argument:

    Si['mu'] = np.average(ee, axis=1)
    Si['mu_star'] = np.average(np.abs(ee), axis=1)
    Si['sigma'] = np.std(ee, axis=1)

The computation of elementary effects is a little trickier though and would require more substantial work.

def compute_elementary_effects(model_inputs, model_outputs, trajectory_size, delta):
    '''
    Arguments:
        - model_inputs - matrix of inputs to the model under analysis.
                         x-by-r where x is the number of variables and
                         r is the number of rows (a function of x and num_trajectories)
        - model_outputs - an r-length vector of model outputs
        - trajectory_size - a scalar indicating the number of rows in a
                            trajectory
    '''
    num_vars = model_inputs.shape[1]
    num_rows = model_inputs.shape[0]
    num_trajectories = int(num_rows / trajectory_size)

    ee = np.zeros((num_trajectories, num_vars), dtype=np.float)

    ip_vec = model_inputs.reshape(num_trajectories,trajectory_size,num_vars)
    ip_cha = np.subtract(ip_vec[:,1:,:], ip_vec[:,0:-1,:])
    up = (ip_cha > 0)
    lo = (ip_cha < 0)

    op_vec = model_outputs.reshape(num_trajectories,trajectory_size)

    result_up = get_increased_values(op_vec, up, lo)
    result_lo = get_decreased_values(op_vec, up, lo)

    ee = np.subtract(result_up, result_lo)
    np.divide(ee, delta, out = ee)

    return ee

@grantstephens
Copy link

Hi Guys

Coming to the party a bit later here, but just started using this library- great work so far- thank you. Just wondering if this has been implemented yet for any of the methods or is it still a manual job?
Don't know how complex the results could get but a dataframe might be an option at some point if the dictionary gets a bit cumbersome.

Cheers

@jdherman
Copy link
Member

Hi @RexFuzzle thanks for using the library.

As far as I know this hasn't been implemented yet. Will might have something in the Morris method, but that's it. There's also a thread-based parallelization for Sobol, but this still only calculates a single index at a time.

There's definitely room for improvement here -- are we talking about vectorized calculations, or just some kind of parallelization? I agree a different data structure could help but that would be a pretty serious renovation under the hood.

I haven't had as much time as I'd like to contribute to this lately, but am certainly open to any suggestions!

@jdherman jdherman changed the title Morris analyze - compute metrics for multiple results in parallel Compute Si for multiple outputs in parallel Nov 7, 2019
@jdherman
Copy link
Member

jdherman commented Nov 7, 2019

This is an open issue for all methods. We want to be able to pass in a matrix of model outputs and have all of the Si values returned somehow.

Ideally the calculation of Si values would be vectorized, but this may not be possible for all methods. There could also be an option to parallelize, because the outputs are all separate.

Right now there is only (shared memory) parallelization for Sobol, but it's parallelized across the parameters, not outputs. In my experience it doesn't add much speedup. I would be in favor of replacing this with a consistent approach across all methods that parallelizes over the outputs (columns of a matrix Y).

@jdherman jdherman added this to Features in SALib Development Roadmap Nov 7, 2019
@jdherman jdherman moved this from Features to Methods in SALib Development Roadmap Nov 11, 2019
@jdherman jdherman moved this from v1.5 onward to v2.0 in SALib Development Roadmap Nov 11, 2019
@ConnectedSystems ConnectedSystems moved this from v2.0 to v1.5 onward in SALib Development Roadmap Oct 4, 2020
@ConnectedSystems ConnectedSystems moved this from v1.5 onward to 1.4.x series in SALib Development Roadmap Jun 27, 2021
@ConnectedSystems
Copy link
Member

ConnectedSystems commented Sep 4, 2021

This is partially addressed with the OO-based interface, which estimates Si on a per-column basis.

We could leave the procedural style unchanged as it offers fine-grain control. Backporting it to the procedural approach would take a lot of work but I'm open to it if needed.

from SALib.test_functions import lake_problem

# Create the SALib Problem specification
sp = ProblemSpec({
	'names': ['a', 'q', 'b', 'mean', 'stdev', 'delta', 'alpha'],
	'bounds': [[0.0, 0.1],
			   [2.0, 4.5],
			   [0.1, 0.45],
			   [0.01, 0.05],
			   [0.001, 0.005],
			   [0.93, 0.99],
			   [0.2, 0.5]],
	'outputs': ['max_P', 'Utility', 'Inertia', 'Reliability']
})

# Parallel example (note the use of `nprocs`)
(sp.sample_saltelli(2**8)
       .evaluate_parallel(lake_problem.evaluate, nprocs=2)
       .analyze_sobol(calc_second_order=True, conf_level=0.95, nprocs=2, seed=101))

A more procedural approach without method-chaining:

sp.sample_saltelli(2**8)
sp.evaluate_parallel(lake_problem.evaluate, nprocs=2)
sp.analyze_sobol(calc_second_order=True, conf_level=0.95, nprocs=2, seed=101)

@judemoh
Copy link

judemoh commented Dec 7, 2023

Hi there,

I am wondering if this has been expanded on since the comment by @ConnectedSystems.

I don't know how to write a custom function equivalent to .evaluate for the test functions for my very own function. Context:
I have tried using the evaluate_parallel() function with ProblemSpec but it states that it is still an experimental feature and may not work. Any updates on the matter would be really useful - thanks for such a cool package!

@tupui
Copy link
Member

tupui commented Dec 8, 2023

Hi @judemoh, nothing much changed so far. Though we have plans to overhaul the API. Hopefully we will have more good news to share around March next year that would allow us to work on that.

@ConnectedSystems
Copy link
Member

ConnectedSystems commented Dec 8, 2023

Hi @judemoh

I have tried using the evaluate_parallel() function with ProblemSpec but it states that it is still an experimental feature and may not work.

I'm fairly confident that what is currently implemented should work, provided that:

  • the function to be assessed meet the expected requirements
  • your computer has enough memory to handle all the results and any intermediate data

The warning is there to manage expectations as I cannot test every possible use case - I know how I use SALib, but I don't know how others would use it, or what computer they use SALib on.

If you provide an example of your function I can help get something working, or at least tell you if it is possible.

For a quick overview, have a look at the documentation here:
https://salib.readthedocs.io/en/latest/user_guide/wrappers.html#parallel-evaluation-and-analysis

Happy to answer any questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

6 participants