-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393
Conversation
…sk outputs as soon as it completes.
Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Olivier Grisel <[email protected]>
Before considering a merge of this PR, could you please run some benchmarks to check that it does not introduce any significant performance regression when not enabling In particular when running a large number of very short tasks, for instance using: which I did not run in a while. |
The benchmark script is probably not working as intended anymore because the final batch size is always 1 but at least there is not noticeable regression in the PR so it's good. |
@@ -124,7 +124,7 @@ def __call__(self, tasks=None): | |||
with parallel_backend('dask'): | |||
for func, args, kwargs in tasks: | |||
results.append(func(*args, **kwargs)) | |||
return results | |||
return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an inline comment to explain why it's important to return within the context manager because it's not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fcharras do you remember why we needed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no.. 🤔 the diff on this file looks meaningless. At least it's harmless.
# worker managed to complete the task and trigger this callback | ||
# call just before being aborted by the reset. | ||
if self.parallel._call_id != self.parallel_call_id: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there is no easy way to trigger this edge case reliably in the tests? Currently codecov complains that this is not covered. Maybe it's non-deterministically covered?
# workaround, we detach the execution of the aborting code to a | ||
# dedicated thread. We then need to make sure the rest of the | ||
# function does not call `_terminate_and_reset` in finally. | ||
if IS_PYPY and current_thread_id != threading.get_ident(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really needed to make this code PyPy specific? I don't think there is any guarantee that an object will always be collected in the same thread in the future (e.g. with nogil CPython, CPython subinterpreters, maybe GraalVM...).
Why not just test for current_thread_id != threading.get_ident()
, rename _PypyGeneratorExitThread
to just _GeneratorExitThread
and only state in the comment that this case can typically be triggered by PyPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will investigate this in a follow up Issue/PR.
Thank you for the review ! a page is turned, history being made. 😁 |
I see from the changelog that several improvements have been made in master. But could you please make a small incremental release with only this PR in it? This would be an awesome feature to access soon! |
We are planning to release Until now, we have not seen any feedback on the |
I installed commit 2303143. My parallelisation is with a very slow function (few minutes), which takes ~20% of my computer's memory. An iterator then writes out results to a text file. I can verify that the generator (return_generator=True) works without failure and delivers results as they come in. This is with the popen_loky_posix backend on an uptodate Ubuntu Linux. |
Nice thanks for the feedback! |
The main goal of this PR is to make it possible to collect intermediate results without having to wait for all results to be computed first.
See the description of the original PR for the context: