-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(workers): Make worker.terminate()
not immediately kill the isolate
#12831
fix(workers): Make worker.terminate()
not immediately kill the isolate
#12831
Conversation
Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds. Closes denoland#12658
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a sensible solution, especially if it fixed the problem at hand. @lucacasonato @bnoordhuis PTAL
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How pressing is this bug? My natural inclination is to wait for the upstream fix (the V8 bug had some movement earlier today) than try to work around it.
pub fn terminate_if_needed(&mut self) -> bool { | ||
let has_terminated = self.is_terminated(); | ||
|
||
if !has_terminated && self.termination_signal.load(Ordering::SeqCst) { | ||
self.terminate(); | ||
return true; | ||
} | ||
|
||
has_terminated | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can end up calling self.terminate()
repeatedly, can't it? That seems bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
terminate_if_needed()
will at most call self.terminate()
once – there are no loops or recursion in this method. And if self.terminate()
has been previously called, self.is_terminated()
will return true, so self.terminate()
won't be called again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, let me clarify: calling terminate_if_needed()
twice can result in two calls to terminate()
.
That's okay when terminate()
is idempotent but I don't think it is, and even if it is, then that's an implicit contract that's easy to miss when making changes later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.terminate()
will set self.has_terminated
to true, which will keep any subsequent call to self.terminate_if_needed()
from calling self.terminate()
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the implicit contract I mean. Something a little less prone to snafus would be appreciated.
A rule of thumb I follow is this: whenever more than one atomic variable pops up, try really hard to replace it with a mutex and a proper critical section.
I guess it's not too pressing. But this isn't the only bug related to isolate termination that we've had (see #12263), and considering that this PR changes worker termination to work the same as Chrome, I'd say this might be better in the long run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I discussed this PR offline with @bnoordhuis who's not to keen on landing it, but since it's fixing an actual bug we agreed to merge it.
…ate (denoland#12831) Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds.
…ate (#12831) Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds. Co-authored-by: Andreu Botella <[email protected]>
I decided to spawn a thread rather than start a sleep future in the tokio event loop in case the host was terminated in the meantime, but if the increase in threads is a problem, we can revisit this. |
Calling `worker.terminate()` used to kill the worker's isolate and then block until the worker's thread finished. This blocks the calling thread if the worker's event loop was blocked in a sync op (as with `Deno.sleepSync`), which wasn't realized at the time, but since the worker's isolate was killed at that moment, it would not block the calling thread if the worker was in a JS endless loop. However, in denoland#12831, in order to work around a V8 bug, worker termination was changed to first set a signal to let the worker event loop know that termination has been requested, and only kill the isolate if the event loop has not finished after 2 seconds. However, this change kept the blocking, which meant that JS endless loops in the worker now blocked the parent for 2 seconds. As it turns out, after denoland#12831 it is fine to signal termination and even kill the worker's isolate without waiting for the thread to finish, so this change does that. However, that might leave the async ops that receive messages and control data from the worker pending after `worker.terminate()`, which leads to odd results from the op sanitizer. Therefore, we set up a `CancelHandler` to cancel those ops when the worker is terminated. Fixes denoland#13705.
…13941) Calling `worker.terminate()` used to kill the worker's isolate and then block until the worker's thread finished. This blocks the calling thread if the worker's event loop was blocked in a sync op (as with `Deno.sleepSync`), which wasn't realized at the time, but since the worker's isolate was killed at that moment, it would not block the calling thread if the worker was in a JS endless loop. However, in #12831, in order to work around a V8 bug, worker termination was changed to first set a signal to let the worker event loop know that termination has been requested, and only kill the isolate if the event loop has not finished after 2 seconds. However, this change kept the blocking, which meant that JS endless loops in the worker now blocked the parent for 2 seconds. As it turns out, after #12831 it is fine to signal termination and even kill the worker's isolate without waiting for the thread to finish, so this change does that. However, that might leave the async ops that receive messages and control data from the worker pending after `worker.terminate()`, which leads to odd results from the op sanitizer. Therefore, we set up a `CancelHandler` to cancel those ops when the worker is terminated.
…13941) Calling `worker.terminate()` used to kill the worker's isolate and then block until the worker's thread finished. This blocks the calling thread if the worker's event loop was blocked in a sync op (as with `Deno.sleepSync`), which wasn't realized at the time, but since the worker's isolate was killed at that moment, it would not block the calling thread if the worker was in a JS endless loop. However, in #12831, in order to work around a V8 bug, worker termination was changed to first set a signal to let the worker event loop know that termination has been requested, and only kill the isolate if the event loop has not finished after 2 seconds. However, this change kept the blocking, which meant that JS endless loops in the worker now blocked the parent for 2 seconds. As it turns out, after #12831 it is fine to signal termination and even kill the worker's isolate without waiting for the thread to finish, so this change does that. However, that might leave the async ops that receive messages and control data from the worker pending after `worker.terminate()`, which leads to odd results from the op sanitizer. Therefore, we set up a `CancelHandler` to cancel those ops when the worker is terminated.
Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling
worker.terminate()
will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds.Closes #12658