Standardized methods outputs #452

chahank · 2021-07-15T12:48:29Z

This is an idea/suggestion for current and future development. It would be very helpful that the basic methods sample and analyze always return data of the same type with a similar structure. In this way, it is much easier to embed SAlib into other packages.

For example, now the method SALib.analyze.morris.analyze returns a dictionary SI_morris containing types list, numpy.ndarray, and numpy.ma.core.MaskedArray whereas SALib.analyze.sobol.analyze returns a dictionary Si_sobol containing types numpy.ndarray. As far as I can tell, there is no fundamental reason to have these differences. Thus, a proposal would be to always use numpy.array. Furthermore, the dictionary SI_morris has the key names, which the SI_sobol does not have.

The text was updated successfully, but these errors were encountered:

ConnectedSystems · 2021-07-15T13:20:27Z

Hi @chahank

If you're doing analyses, try converting to a DataFrame first.

Example:

Si = morris.analyze(...)
Si.to_df()

Otherwise, I agree that standardization is warranted.

chahank · 2021-07-15T13:23:02Z

Thanks! The method to convert to dataframes is indeed quite useful. Unfortunately, I based my embedding of SAlib into CLIMADA on the dictionaries so far. I might change it in the future though.

chahank · 2021-07-15T13:49:11Z

To give a bit more context: one of the reasons that I used the dictionary is that it is easy to differentiate between first and second order indices by checking whether it is a 1D or 2D numpy array. With the dataframe indices, the differentiation is less trivial.

ConnectedSystems · 2021-07-18T08:20:06Z

it is easy to differentiate between first and second order indices by checking whether it is a 1D or 2D numpy array

I think I'm missing some more context here, but are the dictionary keys or DF column names themselves not usable for this?

>>> Si = sobol.analyze(...)

>>> Si
{'S1': array([ 0.31057564,  0.44365337, -0.01296208]),
 'S1_conf': array([0.05220492, 0.05386508, 0.05120255]),
 'ST': array([0.55794676, 0.4421895 , 0.24140187]),
 'ST_conf': array([0.07801056, 0.04067199, 0.02647518]),
 'S2': array([[        nan, -0.01439748,  0.24623147],
        [        nan,         nan,  0.00053893],
        [        nan,         nan,         nan]]),
 'S2_conf': array([[       nan, 0.07894212, 0.10970485],
        [       nan,        nan, 0.05909114],
        [       nan,        nan,        nan]])}

>>> Si.keys()
dict_keys(['S1', 'S1_conf', 'ST', 'ST_conf', 'S2', 'S2_conf'])

>>> Si["S2"]
array([[        nan, -0.01439748,  0.24623147],
       [        nan,         nan,  0.00053893],
       [        nan,         nan,         nan]])

# Convert to DF, in this case returns list of 3 dataframes 
# in order of Total, First and Second order indices
>>> Si.to_df()  
[          ST   ST_conf
 x1  0.557947  0.078011
 x2  0.442189  0.040672
 x3  0.241402  0.026475,
           S1   S1_conf
 x1  0.310576  0.052205
 x2  0.443653  0.053865
 x3 -0.012962  0.051203,
                 S2   S2_conf
 (x1, x2) -0.014397  0.078942
 (x1, x3)  0.246231  0.109705
 (x2, x3)  0.000539  0.059091]

>>> ST, S1, S2 = Si.to_df()

chahank · 2021-07-19T14:06:03Z

I would like to avoid any hard-coded variable names. This is an effort to keep compatibility with potential future methods in SAlib with minimal effort. As long a the structure of the output data from the .sample and .analyze methods do not change, this should be fine. There is of course no guarantee from you that this will not change it in the future, but I hope the package will remain stable for some time.

Since for the different methods in SAlib the output variables have different names, I differentiate between first order and second order indices by looking whether the array is 1D or 2D.

I could also use the Dataframes (but would have to rewrite my integration of SAlib ^^). How stable is the DataFrame output, however? Will it remain available for all methods?

ConnectedSystems · 2021-07-26T08:53:55Z

I would like to avoid any hard-coded variable names
I differentiate between first order and second order indices by looking whether the array is 1D or 2D.

This is not a criticism (I have no context on the integration after all), just wanted to raise that there is a danger to attempting to future-proof against all possible cases. Checking the shape of the returned array is also a little fragile (what happens if we move to providing 3rd order sensitivities as well?). I suppose you could check the length of DataFrame indices (this should increase with number of orders).

Names of methods/results are unlikely to change, however.

How stable is the DataFrame output, however? Will it remain available for all methods?

I intend for this, yes 😄

ConnectedSystems · 2021-07-26T09:19:30Z

Incidentally, would you be in a position to submit a PR listing CLIMADA in the readme?

Like this PR #433

chahank · 2021-07-26T10:13:45Z

I would like to avoid any hard-coded variable names
I differentiate between first order and second order indices by looking whether the array is 1D or 2D.

This is not a criticism (I have no context on the integration after all), just wanted to raise that there is a danger to attempting to future-proof against all possible cases.

Sure! Thanks for the feedback. I agree that future-proof for all possible cases is impossible. So far, I tried to make at least somehow flexible, and it mostly worked out well over the last couple of release of SAlib :D.

Checking the shape of the returned array is also a little fragile (what happens if we move to providing 3rd order sensitivities as well?). I suppose you could check the length of DataFrame indices (this should increase with number of orders).

That could be something. For the moment, the code simply ignores not 1-d or 2-d arrays. But of course, this is not foolproof.

Names of methods/results are unlikely to change, however.
That is why I tried to avoid any hard-coded names so far.

How stable is the DataFrame output, however? Will it remain available for all methods?

I intend for this, yes 😄

Awesome :D !

Thanks again for all the feedback and for making this great package!

ConnectedSystems · 2021-08-08T07:41:26Z

Thanks for PR #455 and your kind words. For the record though I didn't make this package, I merely help maintain it :)

All credit goes to @jdherman @willu47 and their colleagues.

chahank · 2021-08-10T08:50:09Z

One suggestion about the pandas DataFrame output for the higher-order sensitivity indices: instead of using a single index with a longer string "(x1, x2)", it would be convenient to use a multi-index. Hence, to have two separate index columns for the first and second variables.

ConnectedSystems added this to 1.4.x series in SALib Development Roadmap Jul 15, 2021

ConnectedSystems moved this from 1.4.x series to v1.5 onward in SALib Development Roadmap Sep 4, 2021

ConnectedSystems mentioned this issue Aug 21, 2022

Heatmap #526

Merged

ConnectedSystems added this to the v1.5 milestone Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardized methods outputs #452

Standardized methods outputs #452

chahank commented Jul 15, 2021

ConnectedSystems commented Jul 15, 2021

chahank commented Jul 15, 2021

chahank commented Jul 15, 2021 •

edited

Loading

ConnectedSystems commented Jul 18, 2021 •

edited

Loading

chahank commented Jul 19, 2021

ConnectedSystems commented Jul 26, 2021

ConnectedSystems commented Jul 26, 2021

chahank commented Jul 26, 2021 •

edited

Loading

ConnectedSystems commented Aug 8, 2021

chahank commented Aug 10, 2021

Standardized methods outputs #452

Standardized methods outputs #452

Comments

chahank commented Jul 15, 2021

ConnectedSystems commented Jul 15, 2021

chahank commented Jul 15, 2021

chahank commented Jul 15, 2021 • edited Loading

ConnectedSystems commented Jul 18, 2021 • edited Loading

chahank commented Jul 19, 2021

ConnectedSystems commented Jul 26, 2021

ConnectedSystems commented Jul 26, 2021

chahank commented Jul 26, 2021 • edited Loading

ConnectedSystems commented Aug 8, 2021

chahank commented Aug 10, 2021

chahank commented Jul 15, 2021 •

edited

Loading

ConnectedSystems commented Jul 18, 2021 •

edited

Loading

chahank commented Jul 26, 2021 •

edited

Loading