Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified Parameter class #3

Open
ipcamit opened this issue Apr 11, 2023 · 36 comments
Open

Unified Parameter class #3

ipcamit opened this issue Apr 11, 2023 · 36 comments

Comments

@ipcamit
Copy link
Owner

ipcamit commented Apr 11, 2023

Parameters should be treated identically for torch and physics based models.

@ipcamit
Copy link
Owner Author

ipcamit commented Jun 5, 2023

@mjwen @yonatank93
Sorry for the delay, I was traveling last month. So I have some ideas about what the new Parameter could look like, so I would like comments and opinions on it before refining it further. Could you guys suggest a way you could try it? Should I just put it in a different branch of this repo? or should we prep a fresh branch? And would you be able to install and try the experimental version of kliff? Perhaps in conda?

@mjwen
Copy link

mjwen commented Jun 5, 2023

I believe the easiest way is to create a new branch to implement the Parameter, and make a PR to whatever branch has the old Parameter class. Then we can compare them. Of course, this dependents on how significant the change is.

Installing experimental versions should not be a problem for @yonatank93 and me.

@yonatank93
Copy link

Sorry, I still don't have a clear vision about it. So, I would also suggest if @ipcamit can start implementing the Parameter and I can try it and give feedback. I don't have a problem installing the experimental version.

@ipcamit
Copy link
Owner Author

ipcamit commented Jun 5, 2023

I would say changes can be really significant, that is why I asked specifically. If you guys have no issues in installing an experimental test version, then best would be to clone an experimental branch and do pip install -e in a new conda env. Will push the branch soon and let you guys know, along with install info, and detailed explanation and reasoning of proposed changes.

@ipcamit
Copy link
Owner Author

ipcamit commented Jun 12, 2023

Problem: We have two separate find of parameters, OpimizingParameters, and Parameters. With each class having separate functions/class members.

It is complicated even further by the fact that now we have ML models with own parameters.

The parameter transformation is handled by the models, which is awkward, as at any given time Parameter class on own cannot help discerning the state of training/optimization.
We would always need the Model to understand if the model is using transformed state, and if yes, what that transformation is.

Proposal: New single parameter class, that matches Torch parameter class, follow similar conventions.

It takes care of the transformed space and original space. The actual model just uses parameter.numpy() or some similar function to get the underlying data,
the original class would carry transformed data. Just like Torch parameter, it a subclass of numpy.array class, ad hence completely composable with any numpy/scipy function.

This calls for slightly modified workflow, where you get a dictionary of parameters which you can modify and clamp accordingly.
The basic structure remains the same for now:

# imports
from kliff.dataset import Dataset
from kliff.models import KIMModel
from kliff.calculators import Calculator
from kliff.transforms import ParameterTransform # baseclass for implementing all transformations
import numpy as np
from kliff.loss import Loss

# model declaration
ds = Dataset("Si_training_set_4_configs")
model = KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")

# See all parameters
print(model.get_model_params())

First diverging change comes from setting optimizable parameters. We can just give list of strings of above parameters that we
want to optimize as:

# Set A and B as the optimizable parameters
model.set_opt_params(["A", "B"]) 

Constraints and initialization is then a seperate step that first includes getting the set params, followed by setting values, transformations and bounds

params = model.parameters()

class CustomLogTransform(ParameterTransform):
    def __init__(self, base):
        super().__init__("Base" + str(base))
        self.base = base
        
    def transform(self, x):
        return np.emath.logn(x, self.base)
    
    def inverse(self, x):
        return self.base ** x
    
base10transform = CustomLogTransform(10)

params["A"].copy_(5.0)
params["A"].add_clamp_([1.0, 20.0])

params["A"].add_transform_(base10transform)
params["B"].add_clamp_([0.0, np.inf])

params["A"].finalize_()
params["B"].finalize_()

Following which you can simply optimize the model as usual.

calc = Calculator(model)
_ = calc.create(ds.get_configs())
steps = 100
loss = Loss(calc, nprocs=2)
loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": steps})

This decouples transformations and all other cruft from the model, and localizes them in parameter class. While the above example works, currently I am getting NaN in MCMC module. So working on that. In the mean time, I thought this should give you some idea of what I was thinking, and set the ball rolling.

To see above changes you can clone the current repository ipcamit/kliff and checkout kim-master branch,

git clone -b kim-master https://github.com/ipcamit/kliff
cd kliff && pip install -e .

Let me know your thoughts and suggestions.

Given below is complete outline of the class and associated functions.

class Parameter(np.ndarray):
    """
    Torch like parameters, but with numpy. Methods ending with _ modify the object in place.
    Trying to keep the terminology same but it seems a bit difficult. Subclassing ndarray.
    Make it immutable?
    """

    def transform_(self):
        """
        Forward transform to apply when using the array.
        """

    def inverse_(self):
        """
        Inverse transform to apply to get original array.
        """

    def reset_(self):
        """
        backup function to get the original values back.

    def transformed(self):
        """
        return a transformed numpy array for non mutating usages.
        """

    def clamp_(self):
        """
        Bounds the values of the parameter in transformed space.
        """

    def copy_raw_(self, inarray):
        """
        Copy inarray to the Parameter without any check: alternate name unsafe_copy_?
        """

    def copy_(self, arr):
        """
        copy data to array. transform and clamp arr first
        """

    def numpy(self):
        """
        return actual data to be used in KIMModel
        """

    def copy_at_(self, arr, index):
        """
        copy array elemets.
        """

    def save(self, filename):
        """
        Save data to a pickle dictionary, similar to current Parameters
        """

    def load(self, filename):
        """
        load dict from pkl and assign
        """

    def add_transform_(self, transform):
        """
        Add tranform class to the parameter
        """

    def add_clamp_(self, clamp):
        """
        Add bounds to the parameters
        """

    def finalize_(self):
        """
        Properly finalize the parameter after modificaations like transforms etc.
        """

The functions with _ at the end of the name define mutating functions just as in PyTorch.

@mjwen
Copy link

mjwen commented Jun 13, 2023

Thanks @ipcamit. This is great!

Yes, I totally agree that transform on parameters should be implemented within the Parameter class. I love the general idea to combine them, and we can just leave some fields empty when using them as a container for KIM parameters.

I haven't run the example, but only looked at the implementation of the new Parameter class. I have two main questions.

First, any idea on how to handle the case where for a parameter under the same name, some component are fixed and some others are allowed to optimize? To be more specific, for example, if we use the SW model for a system with two species Si and C, then parameter A (and others) will be have a shape of (3,), A[0], A[1], and A[2] being the interactions between Si-Si, Si-C, and C-C, respectively. It is pretty common in fitting physics-based models to fix the parameter value e.g. for Si-Si and only optimize the values for Si-C and C-C. Can we have a general scheme under the new Parameter class?

I guess the major distinction between the original OpimizingParameters and Parameters class is this. For the new Parameter, we probably can add some new functions to register the indices of the components that are fixed. And then when using model.parameters() in the fitting, just ignore these marked components?

Second, the use of clamp is different from the original use of bounds. Originally, the bounds are provided to the (scipy) optimizer and it clamps and keep track of the state of the parameter. Once the initial guess of the parameter values and their bounds are passed to the optimizer, the model and the (original) Parameter has no control over it. The only thing they model and the (original)Parameter do is get the updated parameter values and pass to KIM for energy and force evaluations. In the new Parameter formulation, the control of the bounds is within the Parameter. I am not sure how to pass the clamped parameter to the optimizer during the optimization.

@ipcamit
Copy link
Owner Author

ipcamit commented Jun 13, 2023

SW model for a system with two species Si and C, then parameter A (and others) will be have a shape of (3,), A[0], A[1], and A[2] being the interactions between Si-Si, Si-C, and C-C, respectively. It is pretty common in fitting physics-based models to fix the parameter value e.g. for Si-Si and only optimize the values for Si-C and C-C. Can we have a general scheme under the new Parameter class?

How is that currently dealt with in kimpy internally? Does it return an array or three parameters with same name?
In kliff I checked that it returns array of numbers, ideally I think bounds/clamp can handle it easily with element wise bounds. There we can just set the non-optimizable params as having same max-min bound. Let me try it once and update.

I guess the major distinction between the original OpimizingParameters and Parameters class is this. For the new Parameter, we probably can add some new functions to register the indices of the components that are fixed. And then when using model.parameters() in the fitting, just ignore these marked components?

This is also a viable solution, but I think it would be more confusing the way KLIFF currently presents these parameters (as single array). I think I can just modify the echo_model_params function to print it in tabular form, so that user can understand that they are individual parameters.

Also for array params like that, we can set a separate set_array_opt function, which would accept a boolean array and accordingly register optimizing params as you suggested above.

Originally, the bounds are provided to the (scipy) optimizer and it clamps and keep track of the state of the parameter. Once the initial guess of the parameter values and their bounds are passed to the optimizer, the model and the (original) Parameter has no control over it.

Ah! I misunderstood it then. I thought Parameter class was keeping the state consistent. Honestly passing it to scipy makes it even simpler. One of the biggest source of headache was to keep clamps consistent in transformed and inverse space. If bounds are just container of numbers to be passed to scipy, great!

@mjwen
Copy link

mjwen commented Jun 13, 2023

SW model for a system with two species Si and C, then parameter A (and others) will be have a shape of (3,), A[0], A[1], and A[2] being the interactions between Si-Si, Si-C, and C-C, respectively. It is pretty common in fitting physics-based models to fix the parameter value e.g. for Si-Si and only optimize the values for Si-C and C-C. Can we have a general scheme under the new Parameter class?

How is that currently dealt with in kimpy internally? Does it return an array or three parameters with same name? In kliff I checked that it returns array of numbers, ideally I think bounds/clamp can handle it easily with element wise bounds. There we can just set the non-optimizable params as having same max-min bound. Let me try it once and update.

kimpy always returns a 1D array of floats/ints, even when it is size is 1.

Which components to optimize and which not are kept track of by the OptimizingParameters container (including their bounds and other stuff). Using bounds to set them is not doable; the major reason is that not all optimizer supports the use of bounds. Then we will need to use the ``set_array_opt` approach.

I guess the major distinction between the original OpimizingParameters and Parameters class is this. For the new Parameter, we probably can add some new functions to register the indices of the components that are fixed. And then when using model.parameters() in the fitting, just ignore these marked components?

This is also a viable solution, but I think it would be more confusing the way KLIFF currently presents these parameters (as single array). I think I can just modify the echo_model_params function to print it in tabular form, so that user can understand that they are individual parameters.

Also for array params like that, we can set a separate set_array_opt function, which would accept a boolean array and accordingly register optimizing params as you suggested above.

Originally, the bounds are provided to the (scipy) optimizer and it clamps and keep track of the state of the parameter. Once the initial guess of the parameter values and their bounds are passed to the optimizer, the model and the (original) Parameter has no control over it.

Ah! I misunderstood it then. I thought Parameter class was keeping the state consistent. Honestly passing it to scipy makes it even simpler. One of the biggest source of headache was to keep clamps consistent in transformed and inverse space. If bounds are just container of numbers to be passed to scipy, great!

Yes, it is easier. The problem is how to communicate between an optimizer and the parameters. Previously, all is wrapped under the OptimizingParameters container. Now, since we are removing this class, then besides model.parameters(), we will need something like model.parameter_bounds().

@mjwen
Copy link

mjwen commented Jun 13, 2023

Also, forget that you mentioned that the MCMC example fails. Are you using parameter transformation? If yes, could it be that you forget to do the reverse transform?

@ipcamit
Copy link
Owner Author

ipcamit commented Jun 13, 2023

Yes, its Nan due to negative log in likelihood function. Need to dig bit deeper to ensure all parameters are transformed properly. Let me think more about the array optimization. I would like to keep the option of adding bounds to model as the last resort. I will give first chance to some boolean mask. Something like

model.set_opt_params(["A"])
param = model.parameters()
param["A"].opt_mask_([True, False, True])
param["A"].set_transform_(f)
param["A"].bound_([[1.,2.],[10.,np.inf]]) # 2 true array val so bound has to be a 2d array of 2x2
# basicall n x 2 list or arrays, for n true masks
param["A"].finalize_()

That way all the information is still localized in Parameters.

@mjwen
Copy link

mjwen commented Jun 13, 2023

Looks good. Keep us posted once you have new proposals.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 7, 2023

@mjwen @yonatank93 Updated API and fixed issues:

  • Fixed the MCMC Nan issue.
  • Bounds now work as suggested, by passing values to scipy optimizer
  • Vector params can now be selectively optimized using boolean masks
  • params are not ordered dict for more predictable operations.
    Let me know of your comments on the workflow.

Todo:

  1. Cleanup the functions and API
  • Parameter class will have certain function like bounds in two complementary set to manipulate in both transformed and inverse space.
  • consistent function convention
  1. Set of pure functions to manipulate parameter ordered dict (instead of using internal state functions in model)

Examples of new workflow:

  1. simple optimization in transformed space:
from kliff.dataset import Dataset
from kliff.models import KIMModel
from kliff.calculators import Calculator
from kliff.transforms import ParameterTransform
import numpy as np
from kliff.loss import Loss

# Load dataset
ds = Dataset("Si_training_set_4_configs")

# Load model
model = KIMModel("SW_StillingerWeber_1985_Si__MO_405512056662_006")

# model params
model.get_model_params()

# set params to opt
model.set_opt_params(["A", "B"])
params = model.parameters()

# Set parameter transform
class CustomLogTransform(ParameterTransform):
    def __init__(self, base):
        super().__init__("Base" + str(base))
        self.base = base
        
    def transform(self, x):
        return np.emath.logn(self.base, x)
    
    def inverse(self, x):
        return self.base ** x
    
base10transform = CustomLogTransform(10.0)

# set to transformed value
params["A"].copy_(5.0)

# add bounds for optimization in transformed space
params["A"].add_bounds_(np.array([[0.1, 10]]))
params["B"].add_bounds_(np.array([[-np.inf, -0.1]]))

# set transformaiton class
params["A"].add_transform_(base10transform)
params["B"].add_transform_(base10transform)

# finalize model params
params["A"].finalize_()
params["B"].finalize_()

print(model.get_opt_params()) # should print ['A', 'B']

# Set calculator
calc = Calculator(model)
_ = calc.create(ds.get_configs())

# optimize
steps = 100
loss = Loss(calc, nprocs=2)
loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": steps})
  1. Optimizing vector quantities using boolean mask:
# Load dataset
ds = Dataset("MoS2_10")

# Load model
model = KIMModel("SW_MX2_WenShirodkarPlechac_2017_MoS__MO_201919462778_001")
print(model.get_model_params())

# set optimization parameters
model.set_opt_params(["A", "B"])

params = model.parameters()

# Add bounds in log space
params["A"].add_bounds_(np.array([[0.01, 10],[0.01, 10]]))
params["B"].add_bounds_(np.array([[-np.inf, -0.1],[-np.inf, -0.1]]))

# add transform
params["A"].add_transform_(base10transform)
params["B"].add_transform_(base10transform)

# Set Mo-Mo and Mo-S params as optimizable
B_mask = np.zeros_like(params["B"],dtype=bool)
B_mask[0] = True
B_mask[2] = True

A_mask = np.zeros_like(params["A"],dtype=bool)
A_mask[0] = True
A_mask[2] = True

params["A"].set_opt_mask(A_mask)
params["B"].set_opt_mask(B_mask)

params["A"].finalize_()
params["B"].finalize_()
print(model.get_opt_params())

# Set calculator
calc = Calculator(model)
_ = calc.create(ds.get_configs())
steps = 100
loss = Loss(calc, nprocs=2)
loss.minimize(method="L-BFGS-B", options={"disp": True, "maxiter": steps})
  1. The original MCMC example:
>>>>>
-Instantiate a transformation class to do the log parameter transform
-params_transform = LogParameterTransform(param_names)
-
-# Create the model
-model = KIMModel(
-   model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006",
-   params_transform=params_transform
-)
-
-# Set the tunable parameters and the initial guess
-
-opt_params = {
-   "A": [["default", -8.0, 8.0]],
-   "lambda": [["default", -8.0, 8.0]],
-}
-
-model.set_opt_params(**opt_params)
-model.echo_opt_params()
<<<<<
+param_names = ["A", "lambda"]
+model = KIMModel(model_name="SW_StillingerWeber_1985_Si__MO_405512056662_006")
+
+model.set_opt_params(param_names)
+opt_params = model.parameters()
+
+
+class LogTransform(ParameterTransform):
+    def __init__(self):
+        super().__init__("log")
+    def transform(self, x):
+        return np.log(x)
+    def inverse(self, x):
+        return np.exp(x)
+
+log_transform = LogTransform()
+opt_params["A"].add_transform_(log_transform)
+opt_params["lambda"].add_transform_(log_transform)
+
+opt_params["A"].finalize_()
+opt_params["lambda"].finalize_()

@yonatank93
Copy link

@mjwen @ipcamit Out of curiosity, is there a reason why kliff doesn't use ase.atom.Atoms, at least when we use physics-motivated models? Although I also noticed that kliff.dtaset.Configuration has similar properties as ase.atom.Atoms. I am not sure if this question is appropriate for the discussion here. If not, I can start a new discussion.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 10, 2023

@yonatank93 I think we did discuss it in the MACH conference. I would also like to use some standard format, like ASE atoms. But the rationale behind using custom configuration object was to ensure that we can maintain a minimal required control. I mean ASE atoms might contain fields we do not need, or might be missing some required fields (for example reference to the colabfit dataset that we are developing). So the summary was to to keep the configuration object, and provide parsers with ase and pymatgen etc. However I am willing to move to either ase or pymetgen etc if you and @mjwen agree.

But coming to the above discussion, what do you think of above workflow, and parameter declaration? Do you think it will bring more clarity? or will it make MCMC more convoluted?

@yonatank93
Copy link

@ipcamit I think for MCMC the new workflow is fine. With your update, it doesn't seem to affect the MCMC part's workflow, and it only affects how we initialize the model.

I do have a comment about how we set optimizing parameters in the original KLIFF version. Currently, I need to deal with EAM potential, and my issue with KLIFF is that it takes a long time to set the optimizing parameters. See the example below.

In [1]: from kliff.models import KIMModel

In [2]: model = KIMModel("EAM_Dynamo_MishinFarkasMehl_1999_Al__MO_651801486679_005")

In [3]: %time model.set_opt_params(embeddingData=[["default"]] * 10000)
CPU times: user 1min, sys: 103 ms, total: 1min
Wall time: 1min 1s

My guess is that to set the embedding data as the parameters, KLIFF needs to iterate over the entire tabulated values (in this example there are 10,000 of them). Even if I only want to tune a few parameters, I still need to specify either "default" or "fix" for the entire extent.
This is a reason why I brought up ASE earlier. The KIM calculator in ASE can set/get values of embedding data without looping over the entire tabulated values of the embedding data by giving specific indices of the items in the list of the parameters we want to set/get. I don't know if there is a simple way to reconcile this, but at least I just want to bring up a possible improvement for future updates.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 10, 2023

With your update, it doesn't seem to affect the MCMC part's workflow, and it only affects how we initialize the model.

Yes thats the idea that now Parameters will be more contained while keeping things same. will change other things as we go along.

Thank you for bringing up that example. Though in case of EAM, it will be little slow, I cant think of a reason as to why KLIFF should be this slow. I will check the timings to pinpoint the exact cause.

I think the new parameter class might help it as well. Example, I tried your example in new params class, with 10:100:2 embeddingData values being optimizable.

In [10]: def init():
    ...:     import numpy as np
    ...:     model.set_opt_params(["embeddingData"])
    ...:     params = model.parameters()
    ...:     idx_mask = np.zeros_like(params["embeddingData"], dtype=bool)
    ...:     idx_mask[10:100:2] = True
    ...:     params["embeddingData"].set_opt_mask(idx_mask)
    ...:     params["embeddingData"].finalize_()
    ...: 

In [11]: %time init()
CPU times: user 117 µs, sys: 22 µs, total: 139 µs
Wall time: 145 µs

Its bit more verbose but does it solve your problem? I can make more educated comments on it once I figure exactly why old version is so slow.

@yonatank93
Copy link

Its bit more verbose but does it solve your problem? I can make more educated comments on it once I figure exactly why old version is so slow.

From what I think, the slow part in the old version is around this line. Maybe it's because of the for loop in this method. Especially for EAM there are 10,000 values to iterate over for the embeddingData and other parameters. I might be wrong, though.

@mjwen
Copy link

mjwen commented Jul 11, 2023

First, regarding using ASE Atoms or pymatgen Structure. It is not easy to directly use them as @ipcamit said. For the fitting, we need to keep track of other info besides cell, coords etc. That's where we use our own Configuration to keep track of energy, reference forces, etc. In addition, we can easily add utilities methods that are typically needed in potential fitting.

That being said, it might be a good idea to use ASE Atoms or pymatgen Structure to represent the atomic structure. They are widely used and may offer other utilities. Anyway, we still need a Configuration-like object to keep track of other info. I just don't feel it is alright to attach energy, forces, etc. to Atoms or Structure. I have been moving along these directions in some of my other projects, where I used pymatgen Structure and added additional pydantic checks to ensure the correct shape and typing of all the data. My plan was to replace the KLIFF Configuration sometime in the future with the new stuff.

@mjwen
Copy link

mjwen commented Jul 11, 2023

Second, @yonatank93 you are absolutely correct that the set_one function can be the bottleneck. Apparently the new stuff @ipcamit put there makes it much faster.

I was curious why you want to optimize all EAM embedding data. That's typically not how people develop EAM. If I remember correctly, people traditionally only choose a handful of these points (say 10), then optimize the model, and finally tabulate using more (e.g. 10k) interpolating points. EAM is not a data-driven model, so optimizing 10k data points may not be as good. Nevertheless, I just want to let you know this; but still, we need to make the code as fast as possible.

@yonatank93
Copy link

yonatank93 commented Jul 11, 2023

I was curious why you want to optimize all EAM embedding data. That's typically not how people develop EAM. If I remember correctly, people traditionally only choose a handful of these points (say 10), then optimize the model, and finally tabulate using more (e.g. 10k) interpolating points. EAM is not a data-driven model, so optimizing 10k data points may not be as good. Nevertheless, I just want to let you know this; but still, we need to make the code as fast as possible.

@mjwen I remembered that we were talking about this when we were in Baltimore. There is another thing we want to try, which is to use some kind of surrogate models to approximate the embedding function, etc. Without changing the KIM model, we do this by using the surrogate models to generate the tabulated values. This is why we set all those tabulated values as parameters.

The other case that we tried was to compute the Jacobian of the EAM model in predicting forces of several configurations. I did this just to see which regime of rho and r sampled by the configurations so that when we create our surrogate or if we just use fewer tabulated values, we can appropriately represent these important, sampled regimes.

And the motivation for using some surrogate models is that when I did some calculations, I got some kim errors because I tried to sample the embedding function outside the interpolation regime. So, surrogate model can help extending the domain of the embedding function to some degree.

@mjwen
Copy link

mjwen commented Jul 11, 2023

@ipcamit In general it looks great! We are almost there. I have two questions.

  1. The params["name"].finalize_() stuff seems not alright. The only stuff being doing there is calling self.transform_, so I think we'd better not use it and directly call self.transform_() after calling the add_transform_ function. Or even, we can call transform_() within the add_transform_ function. The reason I hope to do this is to avoid any later execution, which may cause a lot of confusion.

  2. What is the behavior of bounds if the transform is set before bounds are added? Are we treating the values of the bounds in the transformed space or the original one? For the easiness of using and maintaining, let's enforce all settings and getting of param values and bounds are the original space? Or if it easier to work in the transformed space, we can probably add a keyword argument (e.g. transformed_space=False ) to add_bounds_ , and make it work for both?

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 11, 2023

@mjwen

  1. finalize_ method was needed when I was clamping the values on parameter side, instead of passing bounds to scipy etc. Primary reason I kept it is that it clearly demarcates the point beyond which user will not add any new information to the parameters. Hence as you go along, you can change transforms bounds etc, but they will only be applied after finalize_. At present it can work without finalize_ but I think explicit user action of calling finalize_ might make it more clear, and would help us as well, but no strong opinion either way.
  2. I had in mind that the bounds will always be in transformed space, and it will be upto user to provide appropriate value. That removes ambiguity of defining bounds before/after transforms. So I think best is to have the user fix it themselves. There is an internal variable is_transformed that is used to track the parameter value, so it is not difficult to enforce automatic conversion of bounds. However, that creates ambiguity and might create inconsistent bounds (transforming user provided transformed bounds). So best is to enforce it in either transformed or inverse space, and stick with it.

@yonatank93
Copy link

  1. I had in mind that the bounds will always be in transformed space, and it will be upto user to provide appropriate value. That removes ambiguity of defining bounds before/after transforms. So I think best is to have the user fix it themselves. There is an internal variable is_transformed that is used to track the parameter value, so it is not difficult to enforce automatic conversion of bounds. However, that creates ambiguity and might create inconsistent bounds (transforming user provided transformed bounds). So best is to enforce it in either transformed or inverse space, and stick with it.

Going along with @ipcamit comment, the MCMC module as a default uses a uniform prior, and the user needs to specify the bounds of the prior. Since the sampling is done in the transformed space, it would make more sense to input the bounds in the transformed space into the uniform prior.

And for me to do, I might want to have default values of the prior bounds by extracting the parameter bounds set by add_bounds_ Just to be clear, if there are no bounds specified, does an internal function basically set the bounds to inf?

@mjwen
Copy link

mjwen commented Jul 11, 2023

  1. Hence as you go along, you can change transforms bounds etc, but they will only be applied after finalize_.

What do you mean by this, particularly, what is the use case? From a user's perspective, finalize_ is too loose a term, and a user cannot know what is going on inside it without taking a look at the source code. For example, we now have parameter transform and bounds transform, and then a user will have to guess what is transformed.

  1. I had in mind that the bounds will always be in transformed space, and it will be upto user to provide appropriate value. That removes ambiguity of defining bounds before/after transforms. So I think best is to have the user fix it themselves. There is an internal variable is_transformed that is used to track the parameter value, so it is not difficult to enforce automatic conversion of bounds. However, that creates ambiguity and might create inconsistent bounds (transforming user provided transformed bounds). So best is to enforce it in either transformed or inverse space, and stick with it.

I agree leaving the decision to users is a good compromise. My thought is:

  1. if bounds are set before add_transform_, then the bounds are set in the original space
  2. but if they are set after, then the bounds are in the transformed space
    A user should be responsible for it depending on what the target space is.

This is another reason I don't think finalize_ should be there... finalize_ will mix things up.

@mjwen
Copy link

mjwen commented Jul 11, 2023

And for me to do, I might want to have default values of the prior bounds by extracting the parameter bounds set by add_bounds_ Just to be clear, if there are no bounds specified, does an internal function basically set the bounds to inf?

I think there is no default, i.e. if you do not call add_bounds_, the bounds are basically None.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 11, 2023

@yonatank93 if bounds are not set for a parameter, scipy expects (None,None) tuple, which is passed by parameters.
@mjwen The main idea behind finalize was to separate setting up of parameters from using those parameters. finalize_ ensures that setting up bounds and parameters values remain commutative. But at present I can remove the finalize_ function as it does not add much of functionality.

if bounds are set before add_transform_, then the bounds are set in the original space

I would not recommend such functionality where the behavior of optimizer is influenced by mere ordering of initialization of two independent fields. Therefore best course of action would be to just fix the bounds space to either transformed space(user has to provide transformed bounds), which will make more sense with the parameter design, or non-transformed space, where we will consistently apply transformations.

@mjwen
Copy link

mjwen commented Jul 11, 2023

@yonatank93 if bounds are not set for a parameter, scipy expects (None,None) tuple, which is passed by parameters. @mjwen The main idea behind finalize was to separate setting up of parameters from using those parameters.
finalize_ ensures that setting up bounds and parameters values remain commutative. But at present I can remove the finalize_ function as it does not add much of functionality.

I would not recommend such functionality where the behavior of optimizer is influenced by mere ordering of initialization of two independent fields. Therefore best course of action would be to just fix the bounds space to either transformed space(user has to provide transformed bounds), which will make more sense with the parameter design, or non-transformed space, where we will consistently apply transformations.

Depending on what you mean by "initialization", it may or may not be reasonable. If "initialization" means registering the transform function and we are doing later execution (e.g. in finalize_), then, yes, the ordering stuff is really bad. But if we are doing immediate execution (within the function (i.e. add_transform) the transform function is registered), then I think it makes sense to expect different behaviors for the bounds as well. The idea is that the transform function will transform both parameters and bounds that are already set before it is registered. But if a parameter/bounds is set after the registration of the transformed function, then it should be given in the transformed space.

But I agree this can still be confusing and if we stick to a single space, it would be easier. For this, we'd better require the bounds to be in the original space, since that is the case if no parameter transformation is used.

If we are doing this, then, something like finalize_ is needed, because we won't perform any immediate execution of the transform function on parameters/bounds when the transform function is registered with add_transform().

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 11, 2023

An alternative could be to have add_bounds_ and add_transformed_bound_ as two separate functions, which can keep internal state consistent. This will eliminate the use of finalize_ as well. So once you add transform, the parameter will be immediately transform. Now using add_bounds will first apply the transformation, while using add_transformed_bounds will not. Calling add_transformed_boundswith transformation class beingNonewill yield error/warning message that no transformed is applied as transformation is none. This is more cleaner, and removes additional barrier offinalize_`.

I am keen on having the ability of adding transformation state bounds as some quantities might be really easy or intuitive in transformed space, but not so in inverse space.

@mjwen
Copy link

mjwen commented Jul 11, 2023

I am all in the new approach you proposed.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 18, 2023

@mjwen @yonatank93 Updated API and fixed issues:

  • Fixed discussed API issues
  • Cleanup of some old functions, like copy is now copy_to_params to minimize any confusion.

Examples of new workflow:

  1. New simple optimization in transformed space:
params["A"].copy_to_param_(5.0)
params["B"].add_transformed_bounds(np.array([[-2,1]]))

params["A"].add_transform_(base10transform)
params["B"].add_transform_(base10transform)

params["A"].add_bounds(np.array([[0.1, 100]]))

Now no need to call the finalize_ method, and you can add bounds in both normal/ transformed space, with the order of declaring being arbitrary.

  1. Optimizing vector quantities using boolean mask:
params["A"].add_bounds(np.array([[0.01, 10],[0.01, 10]]))
params["A"].add_transform_(base10transform)
A_mask = np.zeros_like(params["A"],dtype=bool)
A_mask[0] = True
A_mask[2] = True

params["A"].add_opt_mask(A_mask)

Now you can pick and choose array elements to optimize, and selectively mark transformation space for any of the parameters. For example in above, param A is minimized in log space, where as everyother param is optimized in normal space. This paves way for dealing with cases where different scaling transformations might help with different parameters.

  1. The original MCMC example remains same, just without finalize_:
param_names = ["A", "lambda"]
model.set_opt_params(param_names)
opt_params = model.parameters()

opt_params["A"].add_transform_(log_transform)
opt_params["lambda"].add_transform_(log_transform)

Comments?

@yonatank93
Copy link

@ipcamit I like that we can apply different transformations to different parameters. Having separate add_bounds methods for setting the bounds in the original and transformed space is clearer from the user's point of view since it is really descriptive of what these methods do. Couple of comments/questions that I have:

  1. Can you remind me what exactly the add_opt_mask does?
  2. This might not be practical, but I want to know if we can set different transformations to different extents of the same parameters. For example, take SW MoS2 potential. Suppose I want to apply log transformation for A_{Mo-Mo} and some linear transformation to A_{S-S}. Is this possible? I am just throwing a possible case that might happen in the future. I personally haven't seen this before. And in practice, we might just want to apply the same transformation to the entire extent of a parameter.

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 18, 2023

  1. Can you remind me what exactly the add_opt_mask does?

Opt masks can be used to selectively mark vector elements as optimizable. For example for MoS2, the parameter A will 3x1 vector, then applying optimization mask of [True, False, False] would make first term as optimizable but mask 2nd and third from optimizer. So you can optimize only Mo-Mo coefficient, but keep original Mo-S and S-S.

  1. This might not be practical, but I want to know if we can set different transformations to different extents of the same parameters. For example, take SW MoS2 potential. Suppose I want to apply log transformation for A_{Mo-Mo} and some linear transformation to A_{S-S}. Is this possible? I am just throwing a possible case that might happen in the future.

Currently it is not possible as I apply the transformations elementwise, but if needed I can add that feature it is not that difficult to add. However if possible we can do it in the next iteration, unless you want to use it straight away in a project.

@yonatank93
Copy link

Currently it is not possible as I apply the transformations elementwise, but if needed I can add that feature it is not that difficult to add. However if possible we can do it in the next iteration, unless you want to use it straight away in a project.

No problem, I don't need it right now, it is just a possible case that I was thinking might come up in the future. I am ok keeping it for the next iteration.

@mjwen
Copy link

mjwen commented Jul 19, 2023

@ipcamit You did not mention it, but I saw here that parameter setting is now also distinguished by in the original space or transformed space. This is great! I believe that this is ready for a PR into the main kliff repo.

Also, I agree with you both that we should not implement functionality that we probably will never use...

@ipcamit
Copy link
Owner Author

ipcamit commented Jul 19, 2023

@mjwen Great! then I will close this issue, cleanup a bit, document and submit a PR. We need to discuss documentation (I remember you mentioning jupyter book). Can we schedule another meeting to discuss where will we merge it and tasks forward?

@mjwen
Copy link

mjwen commented Jul 19, 2023

Yes, let's discuss it. Let's find a time over email or slack...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants