Allow Dask Client to be configured #86

AlecThomson · 2022-06-28T08:00:55Z

Hi there,

I'm wondering if it would be possible to add an optional configuration within tricolour to allow either a Dask Client object to passed in, or an address to a externally running dask-client from the command-line. This would, if specified, bypass the currently created ThreadPool object. The upside of this would allow all Dask Client types, such as dask-mpi and dask-jobqueue which can run jobs over multiple nodes.

Apologies for not opening a PR or similar, but I'm not fully familiar with all the workigs of this project, nor contextlib specifically.

A simple change to app.py's main could just be:

def main():
    with contextlib.ExitStack() as stack:
        # Limit numpy/blas etc threads to 1, as we obtain
        # our parallelism with dask threads
        stack.enter_context(threadpool_limits(limits=1))

        args = create_parser().parse_args()
        if args.client_address is None: #assuming it defaults to None
                # Configure dask pool
        
                if args.nworkers <= 1:
                    log.warn("Entering single threaded mode per user request!")
                    dask.config.set(scheduler='single-threaded')
                else:
                    stack.enter_context(dask.config.set(
                        pool=ThreadPool(args.nworkers)))
        else:
            client = Client(args.client_address)
            # Some context magic here...
            # Maybe stack.enter_context(Client(args.client_address)) ?

        _main(args)

Sorry if this had already been considered and subsequently ruled out!

The text was updated successfully, but these errors were encountered:

sjperkins · 2022-06-28T08:28:47Z

Hi @AlecThomson! Thanks for posting this.

What you're suggesting is indeed feasible. If I might correctly your suggestion slightly, one would pass a scheduler address as a tricolour argument, allowing the tricolour graph to be submitted to a distributed dask scheduler for execution on dask worker nodes.

Some effort would need to investigate whether the dask distributed scheduler successfully handles these graphs.
See dask/distributed#6360 that discuss some the existing issues with the distributed scheduler.

In particular, tricolour packs scan data into a (baseline, time, chan, corr) chunk which is then rechunked per baseline so that flagging is parallelised over this dimension.

The packing step can be done in two ways, depending on the --window-backend option:

in memory, which results in gathering all scan chunks into a single window chunk which is then rechunked.
on disk using zarr, which might be a bit slower, but IIRC has an embarrassingly parallel independent graph.

I would guess (2) would work better on the distributed scheduler, for instance.

bennahugo · 2022-06-28T08:35:34Z

Just a note that the underpinning casacore-tables locking system is not distribution safe when run on multi-host environments since it is using sysctl locking underneath, so this may need somewhat of a rethink.

…

On Tue, Jun 28, 2022 at 10:29 AM Simon Perkins ***@***.***> wrote: Hi @AlecThomson <https://github.com/AlecThomson>! Thanks for posting this. What you're suggesting is indeed feasible. If I might correctly your suggestion slightly, one would pass a scheduler address as a tricolour argument, allowing the tricolour graph to be submitted to a distributed dask scheduler for execution on dask worker nodes. Some effort would need to investigate whether the dask distributed scheduler successfully handles these graphs. See dask/distributed#6360 <dask/distributed#6360> that discuss some the existing issues with the distributed scheduler. In particular, tricolour packs scan data into a (baseline, time, chan, corr) chunk which is then rechunked per baseline so that flagging is parallelised over baseline. The packing step can be done in two ways, depending on the --window-backend option: 1. in memory, which results in gathering all scan chunks into a single window chunk which is then rechunk. 2. on dask using zarr, which might be a bit slower, but IIRC has an embarrassingly parallel independent graph. I would guess (2) would work better on the distributed scheduler, for instance. — Reply to this email directly, view it on GitHub <#86 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4RE6SG62J4V333H6AEXZDVRKZUZANCNFSM52BGKIDA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- -- Benjamin Hugo PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Dask Client to be configured #86

Allow Dask Client to be configured #86

AlecThomson commented Jun 28, 2022

sjperkins commented Jun 28, 2022 •

edited

Loading

bennahugo commented Jun 28, 2022 via email

Allow Dask Client to be configured #86

Allow Dask Client to be configured #86

Comments

AlecThomson commented Jun 28, 2022

sjperkins commented Jun 28, 2022 • edited Loading

bennahugo commented Jun 28, 2022 via email

sjperkins commented Jun 28, 2022 •

edited

Loading