Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEA Implement generator unordered parameter #1463

Merged
merged 27 commits into from
Aug 29, 2023

Conversation

fcharras
Copy link
Contributor

Closes #1449

The PR builds upon #1458

It looks straightforward, the diff in parallel.py outside of documentation is about 10 lines added/removed.

More testing to be added, and an extension to the tutorial, that highlights how RAM usage can be further decreased if the tasks have significantly different processing times maybe ?

@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.08% ⚠️

Comparison is base (abd3af7) 94.93% compared to head (5b7f73c) 94.85%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1463      +/-   ##
==========================================
- Coverage   94.93%   94.85%   -0.08%     
==========================================
  Files          45       45              
  Lines        7515     7546      +31     
==========================================
+ Hits         7134     7158      +24     
- Misses        381      388       +7     
Files Changed Coverage Δ
joblib/parallel.py 96.81% <100.00%> (+0.98%) ⬆️
joblib/test/test_parallel.py 96.18% <100.00%> (+<0.01%) ⬆️

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fcharras fcharras changed the title WIP: Implement generator unordered parameter FEA Implement generator unordered parameter Jun 28, 2023
@fcharras
Copy link
Contributor Author

fcharras commented Jul 6, 2023

I don't think the fail on pypy is related to the PR and it's benign

@ogrisel
Copy link
Contributor

ogrisel commented Jul 7, 2023

There are two warnings making the HTML rendering fail on the CI:

/home/docs/checkouts/readthedocs.org/user_builds/joblib/envs/1463/lib/python3.7/site-packages/joblib/parallel.py:docstring of joblib.parallel.Parallel:46: WARNING: Block quote ends without a blank line; unexpected unindent.
/home/docs/checkouts/readthedocs.org/user_builds/joblib/envs/1463/lib/python3.7/site-packages/joblib/parallel.py:docstring of joblib.parallel.Parallel:45: WARNING: Block quote ends without a blank line; unexpected unindent.

could you please fix those to make it possible to review the rendered HTML for the examples expanded in this PR?

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great but I will finalize the review once the example is running on the CI.

joblib/test/test_parallel.py Show resolved Hide resolved
@fcharras
Copy link
Contributor Author

fcharras commented Jul 12, 2023

The example is running now. The pypy test for backend abortion fails consistently on this PR though, something new in this PR probably has been delaying backend abortion in pypy to the point that this test fails now 🤔

edit: what seems to happen is that the abort_backend unit test is delayed by the slow task (10sc sleep) in the previous test.

@fcharras
Copy link
Contributor Author

NB: current pipeline issues are not related to the latest commit, it seems that the pipeline setup is down because of dependencies issues.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments on the example and the tests. I will do a second pass on the change itself after I finish some local experiments.

examples/parallel_generator.py Show resolved Hide resolved
examples/parallel_generator.py Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
examples/parallel_generator.py Outdated Show resolved Hide resolved
joblib/test/test_parallel.py Show resolved Hide resolved
@ogrisel
Copy link
Contributor

ogrisel commented Aug 23, 2023

@fcharras I think we do not need the _jobs_unordered set. Let's see if this the tests pass on the CI with that suggested change in #1500.

EDIT: the CI is green (although there was an unrelated failure in a jupyter notebook related test). Feel free to cherry-pick fce3475 into this PR.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 23, 2023

Once the above comments are addressed, LGTM.

@fcharras
Copy link
Contributor Author

Thank you for the review @ogrisel !

Regarding fce3475 I thought about that too but in this case after a new task is fired the main process doesn't keep any reference to it, it could just be garbage collected and then strange behavior might happen, and said behavior could depend on the backend ? I'd prefer keep explicitly the set of references in _jobs_unordered for this reason.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 24, 2023

Regarding fce3475 I thought about that too but in this case after a new task is fired the main process doesn't keep any reference to it, it could just be garbage collected and then strange behavior might happen, and said behavior could depend on the backend ?

It's not possible for a backend to call a callback successfully without keeping a reference to the callback object. If a backend does not call the callback, then the backend would be broken for any the value of return_as.

@fcharras
Copy link
Contributor Author

fcharras commented Aug 24, 2023

But couldn't the opposite be true: because some given backend does not keep a reference to any task object internally, then the callback might not be called if the user-level reference is deleted ? basically a backend could use this for early-interruption of tasks on deletion of the refence.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 24, 2023

Since we call:

job = self._backend.apply_async(batch, callback=batch_tracker)

and that the backend has to call the batch_tracker callable once the task is complete, it has to keep a reference to it (or a serialized copy of it, it does not matter). Once the task is complete, and the batch_tracker callable is found by the reference held by the backend (or reconstructed from a serialize copy if the backend chose to do that) and then called. When called the batch_tracker object will re-add itself to the parallel._jobs queue, so there is no problem.

The naming is weird, because the parallel._jobs is actually a queue of task batch "callbacks", but beyond the naming I do not not see any problem with not keeping a reference from the parallel object.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 25, 2023

Please also document the new feature in the changelog, targeting the next release.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thx @fcharras for this nice implem :)

@tomMoral tomMoral merged commit bfd14eb into joblib:master Aug 29, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

return_generator: add async flag to yield results in order completed
3 participants