fix(workers): Make `worker.terminate()` not immediately kill the isolate #12831

andreubotella · 2021-11-21T21:55:49Z

Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling worker.terminate() will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds.

Closes #12658

Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds. Closes denoland#12658

…gnal

bartlomieju

I think this is a sensible solution, especially if it fixed the problem at hand. @lucacasonato @bnoordhuis PTAL

andreubotella · 2021-11-22T13:31:29Z

cargo test was not failing on my machine, but it was on CI before 94bc4e4, so I'd want to run CI a couple more times to be confident that I'm not introducing flakes.

bnoordhuis

How pressing is this bug? My natural inclination is to wait for the upstream fix (the V8 bug had some movement earlier today) than try to work around it.

bnoordhuis · 2021-11-23T21:06:03Z

runtime/web_worker.rs

+ pub fn terminate_if_needed(&mut self) -> bool {
+ let has_terminated = self.is_terminated();
+
+ if !has_terminated && self.termination_signal.load(Ordering::SeqCst) {
+ self.terminate();
+ return true;
+ }
+
+ has_terminated
 }


This method can end up calling self.terminate() repeatedly, can't it? That seems bad.

terminate_if_needed() will at most call self.terminate() once – there are no loops or recursion in this method. And if self.terminate() has been previously called, self.is_terminated() will return true, so self.terminate() won't be called again.

Sorry, let me clarify: calling terminate_if_needed() twice can result in two calls to terminate().

That's okay when terminate() is idempotent but I don't think it is, and even if it is, then that's an implicit contract that's easy to miss when making changes later

self.terminate() will set self.has_terminated to true, which will keep any subsequent call to self.terminate_if_needed() from calling self.terminate() again.

That's the implicit contract I mean. Something a little less prone to snafus would be appreciated.

A rule of thumb I follow is this: whenever more than one atomic variable pops up, try really hard to replace it with a mutex and a proper critical section.

andreubotella · 2021-11-23T21:28:11Z

How pressing is this bug? My natural inclination is to wait for the upstream fix (the V8 bug had some movement earlier today) than try to work around it.

I guess it's not too pressing. But this isn't the only bug related to isolate termination that we've had (see #12263), and considering that this PR changes worker termination to work the same as Chrome, I'd say this might be better in the long run.

bartlomieju

LGTM

I discussed this PR offline with @bnoordhuis who's not to keen on landing it, but since it's fixing an actual bug we agreed to merge it.

…ate (denoland#12831) Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds.

…ate (#12831) Due to a bug in V8, terminating an isolate while a module with top-level await is being evaluated would crash the process. This change makes it so calling `worker.terminate()` will signal the worker to terminate at the next iteration of the event loop, and it schedules a proper termination of the worker's isolate after 2 seconds. Co-authored-by: Andreu Botella <[email protected]>

ry · 2021-12-01T13:35:12Z

This PR resulted in many more threads being used in the "workers_large_message" benchmark:

andreubotella · 2021-12-01T13:41:08Z

I decided to spawn a thread rather than start a sleep future in the tokio event loop in case the host was terminated in the meantime, but if the increase in threads is a problem, we can revisit this.

Calling `worker.terminate()` used to kill the worker's isolate and then block until the worker's thread finished. This blocks the calling thread if the worker's event loop was blocked in a sync op (as with `Deno.sleepSync`), which wasn't realized at the time, but since the worker's isolate was killed at that moment, it would not block the calling thread if the worker was in a JS endless loop. However, in denoland#12831, in order to work around a V8 bug, worker termination was changed to first set a signal to let the worker event loop know that termination has been requested, and only kill the isolate if the event loop has not finished after 2 seconds. However, this change kept the blocking, which meant that JS endless loops in the worker now blocked the parent for 2 seconds. As it turns out, after denoland#12831 it is fine to signal termination and even kill the worker's isolate without waiting for the thread to finish, so this change does that. However, that might leave the async ops that receive messages and control data from the worker pending after `worker.terminate()`, which leads to odd results from the op sanitizer. Therefore, we set up a `CancelHandler` to cancel those ops when the worker is terminated. Fixes denoland#13705.

…13941) Calling `worker.terminate()` used to kill the worker's isolate and then block until the worker's thread finished. This blocks the calling thread if the worker's event loop was blocked in a sync op (as with `Deno.sleepSync`), which wasn't realized at the time, but since the worker's isolate was killed at that moment, it would not block the calling thread if the worker was in a JS endless loop. However, in #12831, in order to work around a V8 bug, worker termination was changed to first set a signal to let the worker event loop know that termination has been requested, and only kill the isolate if the event loop has not finished after 2 seconds. However, this change kept the blocking, which meant that JS endless loops in the worker now blocked the parent for 2 seconds. As it turns out, after #12831 it is fine to signal termination and even kill the worker's isolate without waiting for the thread to finish, so this change does that. However, that might leave the async ops that receive messages and control data from the worker pending after `worker.terminate()`, which leads to odd results from the op sanitizer. Therefore, we set up a `CancelHandler` to cancel those ops when the worker is terminated.

Andreu Botella added 3 commits November 21, 2021 22:55

Actually wake the worker's event loop when setting the termination si…

7b166b3

…gnal

Test

94bc4e4

bartlomieju added this to the 1.17.0 milestone Nov 21, 2021

bartlomieju reviewed Nov 22, 2021

View reviewed changes

bartlomieju added 6 commits November 23, 2021 01:56

reset CI

d1662ce

reset CI

d41842f

reset CI

23caf7f

reset CI

9c755dd

reset CI

bc89a1f

reset CI

0fc3337

bartlomieju requested review from lucacasonato and bnoordhuis November 23, 2021 20:56

bnoordhuis reviewed Nov 23, 2021

View reviewed changes

Merge branch 'main' into schedule-termination

e6f6e76

bartlomieju approved these changes Nov 28, 2021

View reviewed changes

bartlomieju merged commit 4a13c32 into denoland:main Nov 29, 2021

bartlomieju mentioned this pull request Nov 29, 2021

[v1.16] fix(workers): Make worker.terminate() not immediately kill the isol… #12930

Merged

andreubotella deleted the schedule-termination branch November 29, 2021 13:03

bartlomieju mentioned this pull request Dec 1, 2021

Closing a worker in certain steps of the event loop tick can result in a panic #12263

Closed

andreubotella mentioned this pull request Mar 14, 2022

fix(workers): Make worker.terminate() not block the current thread #13941

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workers): Make `worker.terminate()` not immediately kill the isolate #12831

fix(workers): Make `worker.terminate()` not immediately kill the isolate #12831

andreubotella commented Nov 21, 2021

bartlomieju left a comment

andreubotella commented Nov 22, 2021

bnoordhuis left a comment

bnoordhuis Nov 23, 2021

andreubotella Nov 23, 2021

bnoordhuis Nov 24, 2021

andreubotella Nov 24, 2021

bnoordhuis Nov 24, 2021

andreubotella commented Nov 23, 2021

bartlomieju left a comment

ry commented Dec 1, 2021

andreubotella commented Dec 1, 2021

fix(workers): Make worker.terminate() not immediately kill the isolate #12831

fix(workers): Make worker.terminate() not immediately kill the isolate #12831

Conversation

andreubotella commented Nov 21, 2021

bartlomieju left a comment

Choose a reason for hiding this comment

andreubotella commented Nov 22, 2021

bnoordhuis left a comment

Choose a reason for hiding this comment

bnoordhuis Nov 23, 2021

Choose a reason for hiding this comment

andreubotella Nov 23, 2021

Choose a reason for hiding this comment

bnoordhuis Nov 24, 2021

Choose a reason for hiding this comment

andreubotella Nov 24, 2021

Choose a reason for hiding this comment

bnoordhuis Nov 24, 2021

Choose a reason for hiding this comment

andreubotella commented Nov 23, 2021

bartlomieju left a comment

Choose a reason for hiding this comment

ry commented Dec 1, 2021

andreubotella commented Dec 1, 2021

fix(workers): Make `worker.terminate()` not immediately kill the isolate #12831

fix(workers): Make `worker.terminate()` not immediately kill the isolate #12831