Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up refutation by parallelization #527

Merged
merged 5 commits into from
Jul 19, 2022

Conversation

astoeffelbauer
Copy link
Contributor

Hi @amit-sharma,

I've added parallelization to the random cause and the placebo refuter using joblib.

Please let me know what you think of my implementation before I do the same with the rest. For example do you have a suggestion about how to avoid the inner function within refute_estimate which is not overly pretty imo.

I also added to the Getting Started Notebook. Hope this is a good start for the feature.

Copy link
Member

@amit-sharma amit-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commits look good. thanks @astoeffelbauer. Can you also add docstrings for the new parameters that you have introduced and mention that None is the default?

docs/source/example_notebooks/dowhy_simple_example.ipynb Outdated Show resolved Hide resolved
dowhy/causal_refuters/placebo_treatment_refuter.py Outdated Show resolved Hide resolved
dowhy/causal_refuters/placebo_treatment_refuter.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@astoeffelbauer
Copy link
Contributor Author

Thanks for the comments @amit-sharma. I've made those updates.

Since every refutation method can be sped up by itself, do you think it's really necessary to add the example with the parallel loop through multiple refutation methods at the end of the Getting Started Notebook.

If set to n_jobs=-1 for a single method, it uses all of the CPU power so I don't think that running multiple methods in parallel can be any faster that that. So I feel like that might just overcomplicate things.

Copy link
Member

@amit-sharma amit-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @astoeffelbauer Left a few comments. We are getting there :) Happy to merge once these are in.

docs/source/example_notebooks/dowhy_simple_example.ipynb Outdated Show resolved Hide resolved
docs/source/example_notebooks/dowhy_simple_example.ipynb Outdated Show resolved Hide resolved
docs/source/example_notebooks/dowhy_simple_example.ipynb Outdated Show resolved Hide resolved
docs/source/example_notebooks/dowhy_simple_example.ipynb Outdated Show resolved Hide resolved
dowhy/causal_refuters/placebo_treatment_refuter.py Outdated Show resolved Hide resolved
dowhy/causal_refuters/placebo_treatment_refuter.py Outdated Show resolved Hide resolved
@astoeffelbauer
Copy link
Contributor Author

Hi @amit-sharma, thanks again for the comments. I implemented the parallelization for the random_common_cause, placebo_treatment_refuter, and the data_subset_refuter now.

The bootstrap_refuter (raises an error) and the dummy_outcome_refuter (returns a list instead of the results) currently don't work fully or are inconsistent with the others I believe, and they are also not in the Getting Started Notebook, so I skipped them for now.

The other remaining ones are more complex and harder to parallelize I think, and I haven't had a chance to understand them fully yet.

Please let me know if this is fine for now and if you have any other comments. Once we've resolved any last comments or issues, do you think we can finalize and merge in the PR?

Copy link
Member

@amit-sharma amit-sharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you @astoeffelbauer! We can take up how to parallelize the other refuters in a future PR.

@amit-sharma
Copy link
Member

@all-contributors please add @astoeffelbauer for code.

@allcontributors
Copy link
Contributor

@amit-sharma

I've put up a pull request to add @astoeffelbauer! 🎉

@amit-sharma amit-sharma merged commit 4d741e3 into py-why:master Jul 19, 2022
@astoeffelbauer astoeffelbauer deleted the refute/parallelization branch July 19, 2022 20:32
@astoeffelbauer
Copy link
Contributor Author

Thank you @amit-sharma for all the comments and the help!! Hope this will be useful 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants