Fix for joblib not saturating CPU during multiprocessing #1188

janko-petkovic · 2024-07-01T16:14:09Z

What does this implement/fix? Explain your changes

This PR refactors thesimulate_for_sbi method so that now:

it does not rely anymore on simulate_in_batches
it provides a working solution for the multiprocessing issue explored in simulate_for_sbi not saturating CPU #1175
...

Does this close any currently open issues?

Fixes #1175

Any relevant code examples, logs, error output, etc?

Refer to the thread in #1175
...

Any other comments?

Further refactoring will be needed in order to avoid the process_simulator wrapping (possibly reducing the simulation runtime by roughly another 3 times)
...

Checklist

Put an x in the boxes that apply. You can also fill these out after creating
the PR. If you're unsure about any of them, don't hesitate to ask. We're here to
help! This is simply a reminder of what we are going to look for before merging
your code.

I have read and understood the contribution
guidelines
I agree with re-licensing my contribution from AGPLv3 to Apache-2.0.
I have commented my code, particularly in hard-to-understand areas
I have added tests that prove my fix is effective or that my feature works
I have reported how long the new tests run and potentially marked them
with pytest.mark.slow.
New and existing unit tests pass locally with my changes
I performed linting and formatting as described in the contribution
guidelines
I rebased on main (or there are no conflicts with main)
For reviewer: The continuous deployment (CD) workflow are passing.

…ficient_simulator

…mulator. The wrapping/casting currently increases the runtime roughly three times, but the code cannot be breaking for now.

…ovic/sbi into 1175-simulation-multiproc-fix

…ests (pytest -n auto -m "not slow\nand not gpu")

janfb

Thanks for the thorough PR!

I made some comments to refactor simulate_for_sbi. At the moment, the function is a bit cluttered with several if-else cases and I think we can get rid of some of them.

Otherwise, I suggest that you use the files in benchmarks/ just for testing as part of this PR and remove them before merging. Alternatively, you could try to condense them into one concise test and add it to tests/multiprocessing_test.py or so.

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

sbi/inference/base.py

sbi/utils/user_input_checks.py

sbi/inference/base.py

codecov · 2024-07-09T15:29:31Z

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.56%. Comparing base (ba19688) to head (eb1b222).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1188      +/-   ##
==========================================
- Coverage   84.55%   75.56%   -9.00%     
==========================================
  Files          96       96              
  Lines        7603     7629      +26     
==========================================
- Hits         6429     5765     -664     
- Misses       1174     1864     +690

Flag	Coverage Δ
unittests	`75.56% <90.90%> (-9.00%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
sbi/utils/user_input_checks.py	`79.16% <100.00%> (-4.35%)`	⬇️
sbi/inference/base.py	`91.97% <88.88%> (-0.75%)`	⬇️

... and 24 files with indirect coverage changes

…in `simulate_for_sbi`, improved comments, removed `benchmark` folder

janko-petkovic · 2024-07-12T14:54:35Z

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

Absolutely, now the process_simulator is necessary to convert numpy arrays dispatched by joblib to torch.Tensors that the simulator can work with. Before, the theta generated from the prior could be directly passed to the simulator but this will not be the case anymore since the intermediate cast to numpy for parallelization efficiency has been introduced. The patched pipeline looks like this

Prior >	> CASTING_1 >	> Joblib >	> CASTING_2 >	> Simulator >	> CASTING_3
Generate theta	`Tensor > ndarray`	Dispatch to workers	`ndarray > Tensor`	Generate x	`Any > float32`

where

CASTING_1 takes place in inference/base/simulate_for_sbi.py:625 with

theta = proposal.sample((num_simulations,)).numpy()

CASTING_2 and CASTING_3 take place in utils/user_input_checks.py:484 with

    def joblib_simulator(theta: ndarray) -> Tensor:
        return torch.as_tensor(simulator(torch.as_tensor(theta)), dtype=float32)

Wrapping the simulator with process_simulator takes care of CASTING_2 and CASTING_3, and is therefore necessary to run the simulations (if you are not using a workaround with a numpy simulator).

As stated in sbi/utils/user_input_checks.py (lines 504-508) this pipeline is quite clunky and should be changed in the future, but it allows to address the joblib issue without the introduction of breaking changes.

janfb · 2024-07-12T16:48:28Z

One question: is there a specific reason why you add the process_simulator etc to many of the test cases?

Absolutely, now the process_simulator is necessary to convert numpy arrays dispatched by joblib to torch.Tensors that the simulator can work with. Before, the theta generated from the prior could be directly passed to the simulator but this will not be the case anymore since the intermediate cast to numpy for parallelization efficiency has been introduced. The patched pipeline looks like this

Prior > > CASTING_1 > > Joblib > > CASTING_2 > > Simulator > > CASTING_3
Generate theta Tensor > ndarray Dispatch to workers ndarray > Tensor Generate x Any > float32
where

CASTING_1 takes place in inference/base/simulate_for_sbi.py:625 with
theta = proposal.sample((num_simulations,)).numpy()
CASTING_2 and CASTING_3 take place in utils/user_input_checks.py:484 with
    def joblib_simulator(theta: ndarray) -> Tensor:
        return torch.as_tensor(simulator(torch.as_tensor(theta)), dtype=float32)
Wrapping the simulator with process_simulator takes care of CASTING_2 and CASTING_3, and is therefore necessary to run the simulations (if you are not using a workaround with a numpy simulator).

As stated in sbi/utils/user_input_checks.py (lines 504-508) this pipeline is quite clunky and should be changed in the future, but it allows to address the joblib issue without the introduction of breaking changes.

Thanks for the detailed explanation. Overall, this makes sense. But I believe that in the test cases, we do not really have to use simulate_for_sbi because we mostly use uniform or Gaussian priors and vectorized Gaussian simulators. Thus, we can just call them directly:

theta = prior.sample((num_simulations,))
x = simulator(theta)

I know that in some test functions this is not done. But in my view, simulate_for_sbi is really a convenience function for the users that we do not need to use internally. We also do not really need multiprocessing during testing because the toy simulators are basically instant in speed. Or I am missing something here?

Again, please excuse if my previous comments were not clear and have caused extra work for you.

tests/inference_on_device_test.py

…g to sbi-dev#1188 discussion

…e_on_device_test.py sbi-dev#1188 Co-authored-by: Jan <[email protected]>

…ovic/sbi into 1175-simulation-multiproc-fix

janko-petkovic · 2024-07-19T11:57:32Z

Thanks for the detailed explanation. Overall, this makes sense. But I believe that in the test cases, we do not really have to use simulate_for_sbi because we mostly use uniform or Gaussian priors and vectorized Gaussian simulators. Thus, we can just call them directly:
theta = prior.sample((num_simulations,))
x = simulator(theta)
I know that in some test functions this is not done. But in my view, simulate_for_sbi is really a convenience function for the users that we do not need to use internally. We also do not really need multiprocessing during testing because the toy simulators are basically instant in speed. Or I am missing something here?

Again, please excuse if my previous comments were not clear and have caused extra work for you.

No problem at all, thank you for following up on the matter! So, I was refactoring the tests when I realized that simulate_for_sbi allows to control the seed in a very handy manner. Without that some of the tests fail due to unsatisfactory performance (e.g. test_c2st_snl_on_linear_gaussian_different_dims).

Given that the current tests' implementation with simulate_for_sbi does not really have strong downsides, should I still proceed with the refactoring?

janfb · 2024-07-19T14:53:19Z

Yes, makes sense to keep it then. Thanks! Am 19. Juli 2024, 13:57 +0200 schrieb Pizza GitHub ***@***.***>:

…

> Thanks for the detailed explanation. Overall, this makes sense. But I believe that in the test cases, we do not really have to use simulate_for_sbi because we mostly use uniform or Gaussian priors and vectorized Gaussian simulators. Thus, we can just call them directly: > theta = prior.sample((num_simulations,)) > x = simulator(theta) > I know that in some test functions this is not done. But in my view, simulate_for_sbi is really a convenience function for the users that we do not need to use internally. We also do not really need multiprocessing during testing because the toy simulators are basically instant in speed. Or I am missing something here? > Again, please excuse if my previous comments were not clear and have caused extra work for you. No problem at all, thank you for following up on the matter! So, I was refactoring the tests when I realized that simulate_for_sbi allows to control the seed in a very handy manner. Without that some of the tests fail due to unsatisfactory performance (e.g. test_c2st_snl_on_linear_gaussian_different_dims). Given that the current tests' implementation with simulate_for_sbi does not really have strong downsides, should I still proceed with the refactoring? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

janfb

Looks great! You only need to run pre-commit run --all-files to get rid of unused imports I think. Then it will probably be good to go! 🎉

sbi/inference/base.py

tests/inference_on_device_test.py

janfb

Looks good now. Thanks a lot for the effort that went into analyzing and fixing this! Great work 👏

Will merge once the testing PR is merged and all tests are passing.

sbi/inference/base.py

janko-petkovic · 2024-07-22T11:47:39Z

Looks good now. Thanks a lot for the effort that went into analyzing and fixing this! Great work 👏

Will merge once the testing PR is merged and all tests are passing.

Thanks to you for the assistance and the great feeback! :)

janfb · 2024-07-22T11:51:53Z

Actually, one thing that I just realized:
Comparing this and #1187 I was wondering why we still have simulate_in_batches. With the changes made here, we could actually get rid of it entirely or not? Otherwise, it will still be used and it will be slow because it contains the old joblib code?

janko-petkovic · 2024-07-22T11:57:45Z

Actually, one thing that I just realized: Comparing this and #1187 I was wondering why we still have simulate_in_batches. With the changes made here, we could actually get rid of it entirely or not? Otherwise, it will still be used and it will be slow because it contains the old joblib code?

This implementation effectively bypasses simulate_in_batches yes.
In #1187, simulate_for_sbi will call simulate_in_batches depending on the branch one is running. The aim of the test is exactly to show that the old implementation is not saturating the CPUs (therefore it is much slower) while the refactored one withouth simulate_in_batches behaves better. Did I understand correctly what you were asking?

janfb · 2024-07-22T12:04:24Z

OK, I understand. Thus, for this PR we will keep it. But in a future PR, we might well just "refactor-away" simulate_in_batches, right?

janko-petkovic · 2024-07-22T12:09:43Z

OK, I understand. Thus, for this PR we will keep it. But in a future PR, we might well just "refactor-away" simulate_in_batches, right?

Actually, I think I didn't get what you meant. On this branch, simulate_in_batches is not called by simulate_for_sbi. If there are no other methods relying on it or on tqdm_joblib (both in simulators/simutils.py) we can directly delete simulators/simutils.py on this PR.

janfb · 2024-07-22T12:28:05Z

Yes, but there are other methods still using it, e.g., in abc methods and in the tests. I suggest we merge this one now and do everything else in #1187.

janko-petkovic and others added 10 commits June 22, 2024 13:41

refactoring simulate_for_sbi

6a48eed

refactored simulate_for_sbi, introduced new wrapper wrap_as_joblib_ef…

cb2dd49

…ficient_simulator

Finished refactoring simulate_for_sbi and wrap_as_joblib_efficient_si…

a0eb104

…mulator. The wrapping/casting currently increases the runtime roughly three times, but the code cannot be breaking for now.

working on user_input_checks

ba6cdbf

added temporary benchmark folder

8fe2334

adding process_simulator / process_prior to tests

cb9b25e

Merge branch '1175-simulation-multiproc-fix' of github.com:janko-petk…

2448fb2

…ovic/sbi into 1175-simulation-multiproc-fix

added process_simulator and process_prior to some of the tests

825ae89

finished adding process_simulator and process_prior to\ndefault git t…

59159cc

…ests (pytest -n auto -m "not slow\nand not gpu")

Merge branch 'main' into 1175-simulation-multiproc-fix

f64800b

janfb requested changes Jul 9, 2024

View reviewed changes

Janko Petkovic added 2 commits July 12, 2024 12:50

Merged main, solved conflicts in tests/inference_on_device (keept main)

5170fb8

Changes following PR 1188: removed if-else for show_progress_bar …

71ce9ab

…in `simulate_for_sbi`, improved comments, removed `benchmark` folder

janko-petkovic requested a review from janfb July 12, 2024 14:55

janfb reviewed Jul 12, 2024

View reviewed changes

tests/inference_on_device_test.py Outdated Show resolved Hide resolved

Janko Petkovic and others added 4 commits July 19, 2024 13:02

restructured simulation_batch_size logic in simulate_for_sbi accordin…

ad687ff

…g to sbi-dev#1188 discussion

Merge branch 'main' into 1175-simulation-multiproc-fix

2f14147

Bypassed process_simulator and simulate_for_sbi in tests/inferenc…

622ebb1

…e_on_device_test.py sbi-dev#1188 Co-authored-by: Jan <[email protected]>

Merge branch '1175-simulation-multiproc-fix' of github.com:janko-petk…

334c2de

…ovic/sbi into 1175-simulation-multiproc-fix

janko-petkovic requested a review from janfb July 19, 2024 15:44

janfb requested changes Jul 21, 2024

View reviewed changes

sbi/inference/base.py Show resolved Hide resolved

tests/inference_on_device_test.py Outdated Show resolved Hide resolved

janfb mentioned this pull request Jul 21, 2024

CPU Saturation test (prerequisite for #1175 PR) #1187

Closed

9 tasks

Janko Petkovic added 2 commits July 22, 2024 12:14

adjusted imports and formatting

f6befc3

Merge branch 'main' into 1175-simulation-multiproc-fix

1a29910

janko-petkovic requested a review from janfb July 22, 2024 10:31

changed | to Union in inference/base.py::simulate_for_sbi

eb1b222

janfb approved these changes Jul 22, 2024

View reviewed changes

sbi/inference/base.py Outdated Show resolved Hide resolved

janfb merged commit 83e122a into sbi-dev:main Jul 22, 2024
8 checks passed

janko-petkovic deleted the 1175-simulation-multiproc-fix branch July 22, 2024 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for joblib not saturating CPU during multiprocessing #1188

Fix for joblib not saturating CPU during multiprocessing #1188

janko-petkovic commented Jul 1, 2024

janfb left a comment •

edited

Loading

codecov bot commented Jul 9, 2024 •

edited

Loading

janko-petkovic commented Jul 12, 2024

janfb commented Jul 12, 2024

janko-petkovic commented Jul 19, 2024

janfb commented Jul 19, 2024 via email

janfb left a comment

janfb left a comment

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

Fix for joblib not saturating CPU during multiprocessing #1188

Fix for joblib not saturating CPU during multiprocessing #1188

Conversation

janko-petkovic commented Jul 1, 2024

What does this implement/fix? Explain your changes

Does this close any currently open issues?

Any relevant code examples, logs, error output, etc?

Any other comments?

Checklist

janfb left a comment • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jul 9, 2024 • edited Loading

Codecov Report

janko-petkovic commented Jul 12, 2024

janfb commented Jul 12, 2024

janko-petkovic commented Jul 19, 2024

janfb commented Jul 19, 2024 via email

janfb left a comment

Choose a reason for hiding this comment

janfb left a comment

Choose a reason for hiding this comment

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

janko-petkovic commented Jul 22, 2024

janfb commented Jul 22, 2024

janfb left a comment •

edited

Loading

codecov bot commented Jul 9, 2024 •

edited

Loading