Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cli): Set a maximum time to wait for spawn_blocking tasks #18436

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

andreubotella
Copy link
Contributor

Tokio's spawn_blocking allows running blocking operations, which in Deno include file I/O and "non-blocking" FFI, in a thread pool separate from the thread running the JS event loop. However, by default, when the tokio runtime is dropped, it will block on the main thread until all such blocking operations have completed.

This is a problem, since a non-blocking FFI call might never actually finish, which would keep the Deno process running forever, even if it would otherwise have finished earlier because of an uncaught exception. This change, therefore, sets a time limit of 0.5 seconds to wait for those blocking tasks to finish.

Closes #18131.

Tokio's `spawn_blocking` allows running blocking operations, which in
Deno include file I/O and "non-blocking" FFI, in a thread pool
separate from the thread running the JS event loop. However, by
default, when the tokio runtime is dropped, it will block on the main
thread until all such blocking operations have completed.

This is a problem, since a non-blocking FFI call might never actually
finish, which would keep the Deno process running forever, even if it
would otherwise have finished earlier because of an uncaught
exception. This change, therefore, sets a time limit of 0.5 seconds to
wait for those blocking tasks to finish.

Closes denoland#18131.
Copy link
Collaborator

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exact timeout duration might maybe be better as eg. 1 sec, and a parameter for nonblocking FFI calls might be beneficial, but over all this looks good to me.

I'd like a review from @littledivy for the duration related specifics.

Copy link
Contributor

@rojvv rojvv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a confirmation that it works.

},
});

const ONE_HOUR = 1000 * 60 * 60;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don’t you think this is too time-consuming for cases when the test might fail?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, 1 minute should be plenty.

Copy link
Collaborator

@aapoalas aapoalas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the test's timeout. Otherwise this seems like a very simple and very necessary fix <3

Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm marking this as "Request changes", we need to discuss this during CLI WG meeting. Seems like a very radical change.

@rojvv
Copy link
Contributor

rojvv commented Mar 27, 2023

I’ll be using my local Deno build until then.

@littledivy
Copy link
Member

I haven't looked into the root cause much, Why do the FFI uncaught exceptions not let the event loop quit gently?

@andreubotella
Copy link
Contributor Author

andreubotella commented Mar 27, 2023

I haven't looked into the root cause much, Why do the FFI uncaught exceptions not let the event loop quit gently?

This is not about uncaught exceptions coming from FFI at all – notice how the test I added has an exception being thrown from the main module, completely separate from the FFI call. This is also not at all about the Deno event loop, which AFAIK deals with FFI just fine, but about the tokio event loop.

When a tokio runtime is dropped, it will usually wait for any blocking spawned tasks to finish. But for FFI non-blocking calls, this might literally take forever. shutdown_timeout sets a maximum timeout to wait for those tasks, and if they still haven't finished by that point, it just returns. It doesn't kill those tasks, since apparently it's impossible to kill a thread without that thread's cooperation, but what else can you do at that point. If we're exiting from the main web worker in deno run, we will reach the end of the main function shortly after dropping the tokio runtime anyway, which will kill those threads.

If it's not possible to kill a thread without its cooperation, there's really nothing we can do to kill a non-blocking FFI call. But I don't know the specifics of thread-killing across OS's.

// and non-blocking FFI, would usually keep the runtime from shutting down
// until they finish. With FFI this can keep the process from ever shutting
// down, so instead we set a maximum waiting time of half a second.
rt.shutdown_timeout(Duration::from_secs_f64(0.5));
Copy link
Member

@dsherret dsherret Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will affect other things that we might want to wait more than half a second for. For example: #18401 (comment)

@dsherret
Copy link
Member

dsherret commented Mar 27, 2023

Should people be calling Deno.exit instead? What if someone wants this behaviour (or actually, I guess the runtime will be done executing at this point?)?

@rojvv
Copy link
Contributor

rojvv commented Mar 27, 2023

Do you mean should people be replacing all the thrown errors with Deno.exit?

@rojvv
Copy link
Contributor

rojvv commented Mar 27, 2023

call_some_nonblocking_function();
console.debug("Reaches here.");
throw new Error(); // not thrown

Who wants this behavior anyway?

@dsherret
Copy link
Member

dsherret commented Mar 27, 2023

@roj1512 no, I meant people using Deno.exit to shut down if there are any pending blocking tasks due to ffi. In the case of an error, it should shut down regardless. I don't believe using a timeout is the way to do this. We should investigate a way that gets the behaviour without a timeout.

I believe a workaround to your bug is to do the following for now:

try {
  main();
} catch (err) {
  console.error(err);
  Deno.exit(1);
}

@aapoalas
Copy link
Collaborator

It seems to be possible to get the same issue with reading stdin: #14333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FFI: Errors cannot be thrown after a nonblocking function call
6 participants