Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The saved MFK result takes too much space #149

Closed
xuzhengChai opened this issue Jun 5, 2019 · 5 comments
Closed

The saved MFK result takes too much space #149

xuzhengChai opened this issue Jun 5, 2019 · 5 comments

Comments

@xuzhengChai
Copy link

Hi

In case:
3000 sampling points
8 features
using MFK to create surrogate model
e.g.
from smt.extensions import MFK
sm_MFK = MFK(theta0=numpy.ones(8), eval_noise=True, noise0=5)
sm_MFK.set_training_values(X, Y)
sm_MFK.train()

The saved MFK model (sm_MFK) will take around 0.4GB, which is far beyond my expectation.

So is it possible to reduce the taken memory?

Thanks in advance!

@relf
Copy link
Member

relf commented Jun 13, 2019

How do you save the MFK model?

@xuzhengChai
Copy link
Author

I used dill.dump to save it.
I tried to use pickle.dump, but got PicklingError: Can't pickle <class 'function'>: attribute lookup function on builtins failed

@relf
Copy link
Member

relf commented Jun 14, 2019

Ok. I've just fixed the pickle problem with #154, but I do not think it will solve your problem.
Taking a look at the MFK code, it seems you can "nonify" the D_all instance member which is not used in prediction, like this:

sm_MFK.D_all = None
dill.dump(...)

It should decrease the size of the model.

@xuzhengChai
Copy link
Author

Great! Thank you vary much!

By the way, is it also possible to improve the prediction speed?
Still the same surrogate mode, it takes around 440 seconds to predict 1200000 times. And these 1200000 poitns have to be divided into e.g. 100 groups to do prediction separately, otherwise it will raise a MemoryError.

I built the same model using GaussianProcessRegressor in scikit-learn, which takes only 90s to run 1200000 times of prediction, but the model fitting time is too much (~40 minutes), while the fitting time in MFK is only 2 mins!!!

Since I will use surrogate in Bayesian inference, which requires hundreds of thousands of predictions and I also have over one hundred surrogate model. The reduction of both fitting time and prediction time makes sense to me.

Thanks in advance!!!!

@relf
Copy link
Member

relf commented Jun 17, 2019

Well... Feel free to make a pull request if you have a way to improve the current implementation. For the meantime, I close the issue.

@relf relf closed this as completed Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants