RFC: detach() and attach() from/to worker processes #3428

amitmurthy · 2013-06-17T19:33:45Z

It provides a means for the client REPL to be detached from the set of worker processes and reattached later. Typical uses cases:

Long running computation in julia - spanning hours/days. The client terminal can be safely detached after starting the computation
Similar to above, when the workers are in a cloud environment but the client console is the users local machine/laptop.

Currently implemented are detach and attach. Both can only be executed on the client (id =1) process.

detach(connection_file::String) safely removes the client from the process group and writes complete connection information to the connection_file.

attach(connection_file::String) uses the information as garnered during detach and reconnects to the cluster.

Only one client process can be connected to the cluster at any time.

TO BE DONE

Since the above results in the client (id = 1) being detached, it leads to certain issues with the parallel computing infrastructure we currently have - for example, @parallel and pmap. In typical usage the client process (id =1) will be the controller processes for the entire distributed job execution and currently is meant to be interactive - in the sense the terminal (REPL) is expected to be kept open.

Since the client process can now be detached and closed, we should provide an alternative mechanism for the user to push computation work to the "background", query its status and retrieve results independent of the client process.

We can have the following new macros/functions:

@bg_exec(key::String, code_block) - runs the block of code, code_block, in the background, i.e., on the worker with the lowest process id, (lowest, non pid=1 process). The result of the block of code will be stored in a Dict on the said worker with the specified key

bg_clear() - clears the dict on the worker where background jobs are controlled from.

bg_take(key), bg_fetch(key) - takes and fetches the responses from the dict

bg_put(key) - application code can add its own information to this Dict, for example progress information on long running computations that can be queried periodically.

NOTE:

The above scheme puts the process with lowest pid (other than the client) in a special role of keeping some state of the long running computational tasks.
given the co-operative multitasking nature of Julia, application code must play nice and periodically call yield() in order for the client process to be able to attach() and detach() at will.
Due to the seg fault mentioned in PR # 3394, extensive testing of detach/attach has not been possible yet.

Any issues/ suggestions / different schemes for having a clean and consistent implementation of client detach/attach is welcome.

amitmurthy · 2013-06-18T05:16:46Z

While I don't particularly like the idea of detach and attach taking a filename as a parameter, it is required in the current context where we do not have a port mapper infrastructure, and the client is the only one with complete cluster connection details - including whether an ssh tunnel is required or not.
How about renaming attach to reattach?
I am also not very sure about on the entire bg_* set of functions. Maybe, for now, we should let the user be aware of situations where the console may be detached, and hence explicitly execute the long running compute control script on one of the worker processes?

Your views @JeffBezanson @StefanKarpinski @ViralBShah ?

ViralBShah · 2013-06-18T06:38:17Z

I like this general idea, but it would be nice to hear what others have to say. Also, this is one of those things where having a prototype of it working would give a good feel on developing it further. @timholy I believe you do some fairly large computations with julia. Would love to hear your views on the topic.

JeffBezanson · 2013-07-10T08:32:39Z

Cool functionality. I think it should be handled strictly by the REPL though, instead of by killing one of the processes.

implemented detach() and attach() in multi.jl

fb61322

amitmurthy closed this Jul 12, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: detach() and attach() from/to worker processes #3428

RFC: detach() and attach() from/to worker processes #3428

amitmurthy commented Jun 17, 2013

amitmurthy commented Jun 18, 2013

ViralBShah commented Jun 18, 2013

JeffBezanson commented Jul 10, 2013

RFC: detach() and attach() from/to worker processes #3428

RFC: detach() and attach() from/to worker processes #3428

Conversation

amitmurthy commented Jun 17, 2013

TO BE DONE

amitmurthy commented Jun 18, 2013

ViralBShah commented Jun 18, 2013

JeffBezanson commented Jul 10, 2013