Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for joblib not saturating CPU during multiprocessing #1188

Merged
merged 19 commits into from
Jul 22, 2024

Conversation

janko-petkovic
Copy link
Contributor

What does this implement/fix? Explain your changes

This PR refactors thesimulate_for_sbi method so that now:

Does this close any currently open issues?

Fixes #1175

Any relevant code examples, logs, error output, etc?

Refer to the thread in #1175
...

Any other comments?

Further refactoring will be needed in order to avoid the process_simulator wrapping (possibly reducing the simulation runtime by roughly another 3 times)
...

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

  • I have read and understood the contribution
    guidelines
  • I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have reported how long the new tests run and potentially marked them
    with pytest.mark.slow.
  • New and existing unit tests pass locally with my changes
  • I performed linting and formatting as described in the contribution
    guidelines
  • I rebased on main (or there are no conflicts with main)
  • For reviewer: The continuous deployment (CD) workflow are passing.

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough PR!

I made some comments to refactor simulate_for_sbi. At the moment, the function is a bit cluttered with several if-else cases and I think we can get rid of some of them.

Otherwise, I suggest that you use the files in benchmarks/ just for testing as part of this PR and remove them before merging. Alternatively, you could try to condense them into one concise test and add it to tests/multiprocessing_test.py or so.

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

sbi/inference/base.py Outdated Show resolved Hide resolved
sbi/inference/base.py Outdated Show resolved Hide resolved
sbi/inference/base.py Outdated Show resolved Hide resolved
sbi/inference/base.py Outdated Show resolved Hide resolved
sbi/utils/user_input_checks.py Show resolved Hide resolved
sbi/inference/base.py Outdated Show resolved Hide resolved
sbi/inference/base.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Jul 9, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.56%. Comparing base (ba19688) to head (eb1b222).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1188      +/-   ##
==========================================
- Coverage   84.55%   75.56%   -9.00%     
==========================================
  Files          96       96              
  Lines        7603     7629      +26     
==========================================
- Hits         6429     5765     -664     
- Misses       1174     1864     +690     
Flag Coverage Δ
unittests 75.56% <90.90%> (-9.00%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
sbi/utils/user_input_checks.py 79.16% <100.00%> (-4.35%) ⬇️
sbi/inference/base.py 91.97% <88.88%> (-0.75%) ⬇️

... and 24 files with indirect coverage changes

@janko-petkovic
Copy link
Contributor Author

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

Absolutely, now the process_simulator is necessary to convert numpy arrays dispatched by joblib to torch.Tensors that the simulator can work with. Before, the theta generated from the prior could be directly passed to the simulator but this will not be the case anymore since the intermediate cast to numpy for parallelization efficiency has been introduced. The patched pipeline looks like this

Prior > > CASTING_1 > > Joblib > > CASTING_2 > > Simulator > > CASTING_3
Generate theta Tensor > ndarray Dispatch to workers ndarray > Tensor Generate x Any > float32

where

  • CASTING_1 takes place in inference/base/simulate_for_sbi.py:625 with
theta = proposal.sample((num_simulations,)).numpy()
  • CASTING_2 and CASTING_3 take place in utils/user_input_checks.py:484 with
    def joblib_simulator(theta: ndarray) -> Tensor:
        return torch.as_tensor(simulator(torch.as_tensor(theta)), dtype=float32)

Wrapping the simulator with process_simulator takes care of CASTING_2 and CASTING_3, and is therefore necessary to run the simulations (if you are not using a workaround with a numpy simulator).

As stated in sbi/utils/user_input_checks.py (lines 504-508) this pipeline is quite clunky and should be changed in the future, but it allows to address the joblib issue without the introduction of breaking changes.

@janfb
Copy link
Contributor

janfb commented Jul 12, 2024

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

Absolutely, now the process_simulator is necessary to convert numpy arrays dispatched by joblib to torch.Tensors that the simulator can work with. Before, the theta generated from the prior could be directly passed to the simulator but this will not be the case anymore since the intermediate cast to numpy for parallelization efficiency has been introduced. The patched pipeline looks like this

Prior > > CASTING_1 > > Joblib > > CASTING_2 > > Simulator > > CASTING_3
Generate theta Tensor > ndarray Dispatch to workers ndarray > Tensor Generate x Any > float32
where

  • CASTING_1 takes place in inference/base/simulate_for_sbi.py:625 with
theta = proposal.sample((num_simulations,)).numpy()
  • CASTING_2 and CASTING_3 take place in utils/user_input_checks.py:484 with
    def joblib_simulator(theta: ndarray) -> Tensor:
        return torch.as_tensor(simulator(torch.as_tensor(theta)), dtype=float32)

Wrapping the simulator with process_simulator takes care of CASTING_2 and CASTING_3, and is therefore necessary to run the simulations (if you are not using a workaround with a numpy simulator).

As stated in sbi/utils/user_input_checks.py (lines 504-508) this pipeline is quite clunky and should be changed in the future, but it allows to address the joblib issue without the introduction of breaking changes.

Thanks for the detailed explanation. Overall, this makes sense. But I believe that in the test cases, we do not really have to use simulate_for_sbi because we mostly use uniform or Gaussian priors and vectorized Gaussian simulators. Thus, we can just call them directly:

theta = prior.sample((num_simulations,))
x = simulator(theta)

I know that in some test functions this is not done. But in my view, simulate_for_sbi is really a convenience function for the users that we do not need to use internally. We also do not really need multiprocessing during testing because the toy simulators are basically instant in speed. Or I am missing something here?

Again, please excuse if my previous comments were not clear and have caused extra work for you.

@janko-petkovic
Copy link
Contributor Author

Thanks for the detailed explanation. Overall, this makes sense. But I believe that in the test cases, we do not really have to use simulate_for_sbi because we mostly use uniform or Gaussian priors and vectorized Gaussian simulators. Thus, we can just call them directly:

theta = prior.sample((num_simulations,))
x = simulator(theta)

I know that in some test functions this is not done. But in my view, simulate_for_sbi is really a convenience function for the users that we do not need to use internally. We also do not really need multiprocessing during testing because the toy simulators are basically instant in speed. Or I am missing something here?

Again, please excuse if my previous comments were not clear and have caused extra work for you.

No problem at all, thank you for following up on the matter! So, I was refactoring the tests when I realized that simulate_for_sbi allows to control the seed in a very handy manner. Without that some of the tests fail due to unsatisfactory performance (e.g. test_c2st_snl_on_linear_gaussian_different_dims).

Given that the current tests' implementation with simulate_for_sbi does not really have strong downsides, should I still proceed with the refactoring?

@janfb
Copy link
Contributor

janfb commented Jul 19, 2024 via email

Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! You only need to run pre-commit run --all-files to get rid of unused imports I think. Then it will probably be good to go! 🎉

sbi/inference/base.py Show resolved Hide resolved
tests/inference_on_device_test.py Outdated Show resolved Hide resolved
Copy link
Contributor

@janfb janfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks a lot for the effort that went into analyzing and fixing this! Great work 👏

Will merge once the testing PR is merged and all tests are passing.

sbi/inference/base.py Outdated Show resolved Hide resolved
@janko-petkovic
Copy link
Contributor Author

Looks good now. Thanks a lot for the effort that went into analyzing and fixing this! Great work 👏

Will merge once the testing PR is merged and all tests are passing.

Thanks to you for the assistance and the great feeback! :)

@janfb
Copy link
Contributor

janfb commented Jul 22, 2024

Actually, one thing that I just realized:
Comparing this and #1187 I was wondering why we still have simulate_in_batches. With the changes made here, we could actually get rid of it entirely or not? Otherwise, it will still be used and it will be slow because it contains the old joblib code?

@janko-petkovic
Copy link
Contributor Author

Actually, one thing that I just realized: Comparing this and #1187 I was wondering why we still have simulate_in_batches. With the changes made here, we could actually get rid of it entirely or not? Otherwise, it will still be used and it will be slow because it contains the old joblib code?

This implementation effectively bypasses simulate_in_batches yes.
In #1187, simulate_for_sbi will call simulate_in_batches depending on the branch one is running. The aim of the test is exactly to show that the old implementation is not saturating the CPUs (therefore it is much slower) while the refactored one withouth simulate_in_batches behaves better. Did I understand correctly what you were asking?

@janfb
Copy link
Contributor

janfb commented Jul 22, 2024

OK, I understand. Thus, for this PR we will keep it. But in a future PR, we might well just "refactor-away" simulate_in_batches, right?

@janko-petkovic
Copy link
Contributor Author

OK, I understand. Thus, for this PR we will keep it. But in a future PR, we might well just "refactor-away" simulate_in_batches, right?

Actually, I think I didn't get what you meant. On this branch, simulate_in_batches is not called by simulate_for_sbi. If there are no other methods relying on it or on tqdm_joblib (both in simulators/simutils.py) we can directly delete simulators/simutils.py on this PR.

@janfb
Copy link
Contributor

janfb commented Jul 22, 2024

Yes, but there are other methods still using it, e.g., in abc methods and in the tests. I suggest we merge this one now and do everything else in #1187.

@janfb janfb merged commit 83e122a into sbi-dev:main Jul 22, 2024
8 checks passed
@janko-petkovic janko-petkovic deleted the 1175-simulation-multiproc-fix branch July 22, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

simulate_for_sbi not saturating CPU
2 participants