-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardized methods outputs #452
Comments
Hi @chahank If you're doing analyses, try converting to a DataFrame first. Example:
Otherwise, I agree that standardization is warranted. |
Thanks! The method to convert to dataframes is indeed quite useful. Unfortunately, I based my embedding of SAlib into CLIMADA on the dictionaries so far. I might change it in the future though. |
To give a bit more context: one of the reasons that I used the dictionary is that it is easy to differentiate between first and second order indices by checking whether it is a 1D or 2D numpy array. With the dataframe indices, the differentiation is less trivial. |
I think I'm missing some more context here, but are the dictionary keys or DF column names themselves not usable for this? >>> Si = sobol.analyze(...)
>>> Si
{'S1': array([ 0.31057564, 0.44365337, -0.01296208]),
'S1_conf': array([0.05220492, 0.05386508, 0.05120255]),
'ST': array([0.55794676, 0.4421895 , 0.24140187]),
'ST_conf': array([0.07801056, 0.04067199, 0.02647518]),
'S2': array([[ nan, -0.01439748, 0.24623147],
[ nan, nan, 0.00053893],
[ nan, nan, nan]]),
'S2_conf': array([[ nan, 0.07894212, 0.10970485],
[ nan, nan, 0.05909114],
[ nan, nan, nan]])}
>>> Si.keys()
dict_keys(['S1', 'S1_conf', 'ST', 'ST_conf', 'S2', 'S2_conf'])
>>> Si["S2"]
array([[ nan, -0.01439748, 0.24623147],
[ nan, nan, 0.00053893],
[ nan, nan, nan]])
# Convert to DF, in this case returns list of 3 dataframes
# in order of Total, First and Second order indices
>>> Si.to_df()
[ ST ST_conf
x1 0.557947 0.078011
x2 0.442189 0.040672
x3 0.241402 0.026475,
S1 S1_conf
x1 0.310576 0.052205
x2 0.443653 0.053865
x3 -0.012962 0.051203,
S2 S2_conf
(x1, x2) -0.014397 0.078942
(x1, x3) 0.246231 0.109705
(x2, x3) 0.000539 0.059091]
>>> ST, S1, S2 = Si.to_df() |
I would like to avoid any hard-coded variable names. This is an effort to keep compatibility with potential future methods in SAlib with minimal effort. As long a the structure of the output data from the Since for the different methods in SAlib the output variables have different names, I differentiate between first order and second order indices by looking whether the array is 1D or 2D. I could also use the Dataframes (but would have to rewrite my integration of SAlib ^^). How stable is the DataFrame output, however? Will it remain available for all methods? |
This is not a criticism (I have no context on the integration after all), just wanted to raise that there is a danger to attempting to future-proof against all possible cases. Checking the shape of the returned array is also a little fragile (what happens if we move to providing 3rd order sensitivities as well?). I suppose you could check the length of DataFrame indices (this should increase with number of orders). Names of methods/results are unlikely to change, however.
I intend for this, yes 😄 |
Incidentally, would you be in a position to submit a PR listing CLIMADA in the readme? Like this PR #433 |
Sure! Thanks for the feedback. I agree that future-proof for all possible cases is impossible. So far, I tried to make at least somehow flexible, and it mostly worked out well over the last couple of release of SAlib :D.
That could be something. For the moment, the code simply ignores not 1-d or 2-d arrays. But of course, this is not foolproof.
Awesome :D ! Thanks again for all the feedback and for making this great package! |
One suggestion about the pandas DataFrame output for the higher-order sensitivity indices: instead of using a single index with a longer string |
This is an idea/suggestion for current and future development. It would be very helpful that the basic methods
sample
andanalyze
always return data of the same type with a similar structure. In this way, it is much easier to embed SAlib into other packages.For example, now the method
SALib.analyze.morris.analyze
returns a dictionarySI_morris
containing typeslist
,numpy.ndarray
, andnumpy.ma.core.MaskedArray
whereasSALib.analyze.sobol.analyze
returns a dictionarySi_sobol
containing typesnumpy.ndarray
. As far as I can tell, there is no fundamental reason to have these differences. Thus, a proposal would be to always usenumpy.array
. Furthermore, the dictionarySI_morris
has the keynames
, which theSI_sobol
does not have.The text was updated successfully, but these errors were encountered: