RFC: fix race condition on task failure #12736

JeffBezanson · 2015-08-21T18:10:00Z

We tried adding this message. It was a worthwhile experiment and seemed like a good idea, even to me, but at this point I just can't take it any more.

The message is a race condition, because the order of task T running and another task calling wait(T) now matters, where it didn't before. This has led to the need for a SUPPRESS_EXCEPTION_PRINTING flag, which is just silly.

vtjnash · 2015-08-21T18:17:23Z

ref #12403

StefanKarpinski · 2015-08-21T18:33:12Z

So this means that if a Task has an error and no one waits for it, it vanishes into the void?

JeffBezanson · 2015-08-21T18:35:30Z

Basically yes, but wait isn't the only way to detect the error. You can print the task itself (which will now even show a backtrace) or examine its status field.

StefanKarpinski · 2015-08-21T18:36:57Z

That seems fair. What about printing the error if the Task is GC:ed without being waited on and possibly upon process shut down if not?

JeffBezanson · 2015-08-21T19:01:57Z

With those other behaviors, we would still need the SUPPRESS_EXCEPTION_PRINTING flag.

Tying stuff to GC is no good, since it is wildly unpredictable.

FWIW, I believe erlang does the same thing, keeping process failures silent unless another process is watching.

StefanKarpinski · 2015-08-21T19:05:55Z

If Erlang does it that way, it's probably the right thing to do, but it does worry my letting things fail and fall into the void.

ScottPJones · 2015-08-23T01:04:29Z

That's part of Erlang's whole model, I not sure it would be truly applicable to Julia. Silent failures give me the creeps.

JeffBezanson · 2015-08-23T01:44:40Z

I think of it in terms of general asynchronous programming. For example, what should

remotecall(2, error, "oops")

do? Indeed, it just sits there seemingly doing nothing until you try to fetch the answer. I don't recall many complaints about this.

Keno · 2015-08-23T11:09:19Z

I'm a little worried about uses cases such as:

@async while true
listen(...)
...
end

There should probably be a good way to print the error message. Maybe just a macro that wraps everything in try catch and calls the appropriate show_backtrace magic.

JeffBezanson · 2015-08-23T15:14:39Z

I think the best approach is something like

function monitor(t::Task)
    try
        wait(t)
    catch e
        # show exception info
    end
end

You want to use wait, since that even handles the case where a task gets an exception before it starts.

ScottPJones · 2015-08-23T16:27:45Z

Could the starting process asynchronously get the message (in a thread?), and keep it until the main thread wait()'s on it?
That way a worker process could go away immediately after delivering it's final message, freeing up resources.

JeffBezanson · 2015-08-23T17:21:21Z

Yes that sounds possible, but it seems to only affect resource use and not when errors get printed.

ScottPJones · 2015-08-23T17:47:23Z

I've had cases of filling up process tables or hitting OS limits on number of processes on different platforms, which is why that can be important

amitmurthy · 2015-08-24T03:58:59Z

Agreeing with Keno's concern. The printing of errors is useful when using @schedule calls - where we do not really want to keep a reference to the started task, but would like any errors to be printed to screen (from a error reporting / debugging point of view).

amitmurthy · 2015-08-24T05:42:45Z

I am OK with printing unhandled errors based on an an environment flag - say, JULIA_DEBUG=0/1, with the default being off, i.e., no printing. This addresses development time issues, where sometimes misspelling a variable name, or other syntactical errors can be discovered quickly. If you expect runtime errors, then you should wrap your async block in a try-catch and handle it properly.

JeffBezanson · 2015-08-28T22:03:17Z

Some kind of environment var would be ok I guess.

One reason I find this important is that it's actually not just a matter of whether errors are expected within a certain task. It's about order of events. For example sync_add needs to set SUPPRESS_EXCEPTION_PRINTING because we're spawning several tasks, then waiting on each. A task might fail in the middle of the process.

StefanKarpinski · 2015-08-28T22:11:40Z

Can't the way you suppress raising exceptions be to explicitly wait on a task and wrap the wait in try/catch? It seems fine to me to make it hard to ignore exceptions.

JeffBezanson · 2015-08-28T22:17:18Z

No, because a task usually starts before anybody calls wait. That's the race condition we have.

vtjnash · 2021-04-26T15:27:57Z

Implemented in #39518

remove "unhandled task failure" message

9390163

JeffBezanson mentioned this pull request Apr 12, 2018

broken docs or feature on "Simple TCP Example" #26777

Closed

JeffBezanson closed this Jun 21, 2018

JeffBezanson deleted the jb/notaskmessage branch June 21, 2018 23:08

JeffBezanson mentioned this pull request Jun 21, 2018

remove "unhandled task failure" message printing #27722

Merged

c42f mentioned this pull request Jan 7, 2020

Log errors for unhandled task exceptions #34279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: fix race condition on task failure #12736

RFC: fix race condition on task failure #12736

JeffBezanson commented Aug 21, 2015

vtjnash commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

JeffBezanson commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

JeffBezanson commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

ScottPJones commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

Keno commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

ScottPJones commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

ScottPJones commented Aug 23, 2015

amitmurthy commented Aug 24, 2015

amitmurthy commented Aug 24, 2015

JeffBezanson commented Aug 28, 2015

StefanKarpinski commented Aug 28, 2015

JeffBezanson commented Aug 28, 2015

vtjnash commented Apr 26, 2021

RFC: fix race condition on task failure #12736

RFC: fix race condition on task failure #12736

Conversation

JeffBezanson commented Aug 21, 2015

vtjnash commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

JeffBezanson commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

JeffBezanson commented Aug 21, 2015

StefanKarpinski commented Aug 21, 2015

ScottPJones commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

Keno commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

ScottPJones commented Aug 23, 2015

JeffBezanson commented Aug 23, 2015

ScottPJones commented Aug 23, 2015

amitmurthy commented Aug 24, 2015

amitmurthy commented Aug 24, 2015

JeffBezanson commented Aug 28, 2015

StefanKarpinski commented Aug 28, 2015

JeffBezanson commented Aug 28, 2015

vtjnash commented Apr 26, 2021