FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393

fcharras · 2023-02-17T10:55:08Z

The main goal of this PR is to make it possible to collect intermediate results without having to wait for all results to be computed first.

See the description of the original PR for the context:

Add return_generator functionality #588

…sk outputs as soon as it completes.

Co-authored-by: Olivier Grisel <[email protected]>

ogrisel · 2023-04-18T12:28:13Z

Before considering a merge of this PR, could you please run some benchmarks to check that it does not introduce any significant performance regression when not enabling return_generator?

In particular when running a large number of very short tasks, for instance using:

https://github.com/joblib/joblib/blob/master/benchmarks/bench_auto_batching.py

which I did not run in a while.

tomMoral · 2023-04-18T15:52:38Z

I ran the proposed benchmark and I don't see any change between the two branches.

I also did a benchmark with small scale gridsearch on synthetic data w/ sklearn and looked at the scaling with the number of CPUs. Here are the results (I will put the benchmark_script in a separate PR).

It seems safe to say that there is not noticeable performance drop, so I propose merging this one :)

ogrisel · 2023-04-18T16:35:41Z

The benchmark script is probably not working as intended anymore because the final batch size is always 1 but at least there is not noticeable regression in the PR so it's good.

ogrisel · 2023-04-18T16:39:03Z

joblib/_dask.py

@@ -124,7 +124,7 @@ def __call__(self, tasks=None):
 with parallel_backend('dask'):
 for func, args, kwargs in tasks:
 results.append(func(*args, **kwargs))
- return results
+  return results


Please add an inline comment to explain why it's important to return within the context manager because it's not obvious.

@fcharras do you remember why we needed this?

no.. 🤔 the diff on this file looks meaningless. At least it's harmless.

ogrisel · 2023-04-18T16:41:23Z

joblib/parallel.py

+ # worker managed to complete the task and trigger this callback
+ # call just before being aborted by the reset.
+ if self.parallel._call_id != self.parallel_call_id:
+ return


I suppose there is no easy way to trigger this edge case reliably in the tests? Currently codecov complains that this is not covered. Maybe it's non-deterministically covered?

ogrisel · 2023-04-18T16:48:13Z

joblib/parallel.py

+ # workaround, we detach the execution of the aborting code to a
+ # dedicated thread. We then need to make sure the rest of the
+ # function does not call `_terminate_and_reset` in finally.
+ if IS_PYPY and current_thread_id != threading.get_ident():


Is this really needed to make this code PyPy specific? I don't think there is any guarantee that an object will always be collected in the same thread in the future (e.g. with nogil CPython, CPython subinterpreters, maybe GraalVM...).

Why not just test for current_thread_id != threading.get_ident(), rename _PypyGeneratorExitThread to just _GeneratorExitThread and only state in the comment that this case can typically be triggered by PyPy.

We will investigate this in a follow up Issue/PR.

CHANGES.rst

fcharras · 2023-04-18T21:15:20Z

Thank you for the review ! a page is turned, history being made. 😁

JohannesBuchner · 2023-05-24T09:29:01Z

I see from the changelog that several improvements have been made in master.

But could you please make a small incremental release with only this PR in it?

This would be an awesome feature to access soon!

tomMoral · 2023-05-24T09:34:45Z

We are planning to release joblib at the beginning of June, in part because of this PR which involves a major refactor of the joblib code.

Until now, we have not seen any feedback on the master branch. If you have the opportunity to test the master branch on your workflow and report if it works smoothly or if you have any issues, that would be of great help to fasten the release :)

JohannesBuchner · 2023-05-24T10:51:26Z

I installed commit 2303143.

My parallelisation is with a very slow function (few minutes), which takes ~20% of my computer's memory. An iterator then writes out results to a text file.

I can verify that the generator (return_generator=True) works without failure and delivers results as they come in. This is with the popen_loky_posix backend on an uptodate Ubuntu Linux.

tomMoral · 2023-05-24T12:00:59Z

Nice thanks for the feedback!

Franck Charras and others added 30 commits May 20, 2021 00:34

Add option "return_generator" to joblib.Parallel to enable getting ta…

df54bd3

…sk outputs as soon as it completes.

wip: return_generator with tight resource management

8a4de37

Merge branch 'master' into return_generator

f73ee6c

CLN nitpicks and comments

3fd2db7

CLN use global constants for task status

b86f802

CLN backend attributes and async_cb support

f91840e

CLN nitpick no + no copy jobs

3e7f7de

CLN reorder output generation

b778675

CLN add comment

91c55e8

DOC add reference to return_gen example

95d8f04

ENH ShutdownWorkerError raise comprehensive error

657409c

ENH use RuntimeError for already used instances

667cb14

Merge branch 'master' into return_generator

9e77b36

Merge branch 'master' into return_generator

fc5b611

FIX linter

559656b

Make example display some output while running

ec0688f

DOC better format docstring

d2b2c12

Co-authored-by: Olivier Grisel <[email protected]>

Update doc/parallel.rst

8004a95

Co-authored-by: Olivier Grisel <[email protected]>

TST skip if no multiprocessing

3d43fcd

FIX test relying on gc for pypy

16326bb

CI trigger

82108b5

CLN rename DelayedResult->SequentialResult

b8263da

DOC example with memory usage logged

8a9dd86

DOC tweak example so it runs

e709b1e

ENH simple generator on n_jobs=1+simplify error handling

4bca3cd

DBG failure with empty results on macos

fc07ff1

TST fix flaky test

4514d84

FIX better gc generator in CI

b26a321

RFC banckend resolution

cac6437

CLN more comments

16fdd85

tomMoral added 2 commits April 18, 2023 10:53

CI trigger

1378bee

TST increase timeout to make test more robust

889a0ed

tomMoral mentioned this pull request Apr 18, 2023

WIP: return generator clean pipeline runs #1419

Closed

tomMoral added 2 commits April 18, 2023 11:39

DOC add entry in changelog

cab5a0b

Merge branch 'master' into return_generator_clean

07807f9

tomMoral mentioned this pull request Apr 18, 2023

BUG nesting warning in Pypy not always raised/captured #1426

Open

MTN xfail nested warnign failure in pypy

6c50b3a

ogrisel reviewed Apr 18, 2023

View reviewed changes

CHANGES.rst Show resolved Hide resolved

tomMoral added 4 commits April 18, 2023 19:25

DOC add changelog entry for n_jobs=1

74f8023

CI trigger

0453228

CI trigger

d483ed0

CI trigger

e1af16f

tomMoral merged commit ceb203b into joblib:master Apr 18, 2023

fcharras deleted the return_generator_clean branch April 18, 2023 21:09

fcharras mentioned this pull request Apr 18, 2023

IterableParallel #1308

Closed

tomMoral mentioned this pull request Apr 18, 2023

CLN make generator exit thread safe #1428

Merged

JohannesBuchner mentioned this pull request May 25, 2023

failed/crashed tasks give long stacktrace #1445

Closed

tomMoral mentioned this pull request Jul 11, 2023

make joblib.Parallel return a generator #217

Closed

fcharras mentioned this pull request Aug 29, 2023

Asynchronous output variation of Parallel.__call__ #79

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393

FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393

fcharras commented Feb 17, 2023 •

edited by ogrisel

Loading

ogrisel commented Apr 18, 2023 •

edited

Loading

tomMoral commented Apr 18, 2023

ogrisel commented Apr 18, 2023

ogrisel Apr 18, 2023

tomMoral Apr 18, 2023

fcharras Apr 18, 2023

ogrisel Apr 18, 2023

ogrisel Apr 18, 2023 •

edited

Loading

tomMoral Apr 18, 2023

fcharras commented Apr 18, 2023

JohannesBuchner commented May 24, 2023

tomMoral commented May 24, 2023

JohannesBuchner commented May 24, 2023 •

edited

Loading

tomMoral commented May 24, 2023

FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393

FEA return generator, #588 stripped of unrelated changes for minimal diff review #1393

Conversation

fcharras commented Feb 17, 2023 • edited by ogrisel Loading

ogrisel commented Apr 18, 2023 • edited Loading

tomMoral commented Apr 18, 2023

ogrisel commented Apr 18, 2023

ogrisel Apr 18, 2023

Choose a reason for hiding this comment

tomMoral Apr 18, 2023

Choose a reason for hiding this comment

fcharras Apr 18, 2023

Choose a reason for hiding this comment

ogrisel Apr 18, 2023

Choose a reason for hiding this comment

ogrisel Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

tomMoral Apr 18, 2023

Choose a reason for hiding this comment

fcharras commented Apr 18, 2023

JohannesBuchner commented May 24, 2023

tomMoral commented May 24, 2023

JohannesBuchner commented May 24, 2023 • edited Loading

tomMoral commented May 24, 2023

fcharras commented Feb 17, 2023 •

edited by ogrisel

Loading

ogrisel commented Apr 18, 2023 •

edited

Loading

ogrisel Apr 18, 2023 •

edited

Loading

JohannesBuchner commented May 24, 2023 •

edited

Loading