Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ext/ffi): Avoid keeping JsRuntimeState RefCell borrowed for event loop middleware calls #15116

Merged

Conversation

aapoalas
Copy link
Collaborator

@aapoalas aapoalas commented Jul 7, 2022

JsRuntimeState was mistakenly kept borrowed during event loop middleware calls for the purpose of getting a clone of the Rc<RefCell<OpState>> even though the cloning removes the need for the borrow to be kept. With this change thread safe FFI callbacks can now safely call ops that themselves access the JsRuntimeState such as ops that create Promises etc.

At the same time, FFI callbacks are changed to set the Isolate's MicrotasksPolicy to Explicit. This is done in order to allow thread safe callbacks to return before any microtasks queued by it get executed.

As an example, imagine a thread safe callback using this JavaScript callback:

const callback = () => {
  console.log("Sync work");
  Promise.resolve().then(() => {
    console.log("Async work");
  });
};

If any JS code were to call the callback() and immediately log "Called 'callback'" on return, the output would be

Sync work
Called 'callback'
Async work

For an equivalent FFI library call to the C callback that is callback wrapped in an UnsafeCallback the order of the two last lines would be reversed. The reason this is that the callback gets called from an empty stack, and as it finishes Deno would automatically run any pending microtasks. This is not how we want callbacks, especially thread safe callbacks, to work.

Thread safe callbacks are already being polled from inside the event_loop_middleware, which probably (I honestly am not exactly sure) already run microtasks after it is done, or at least on the next event loop tick. Thus, having these callbacks themselves run microtasks is a bit redundant and most importantly it makes it hard for these callbacks to schedule asynchronous work without themselves being blocked by said "asynchronous work". We want thread safe callbacks to have the ability to call "synchronously" into Deno JS code and after that synchronous work is done to immediately return control (unblock) to the calling thread.

Setting the MicrotasksPolicy could also be done only for thread safe callbacks. However, interrupt handlers being called on the main thread also suffer this same issue. They also suffer from other occasionally panic-inducing issues which make them quite inadvisable, but still this change is relevant for those as well. Normal synchronous callbacks are only affected this in the sense that they're now doing an unnecessary check on the policy value. Their scopes are not built on an empty stack, and as such they would not run pending microtasks anyway.

@bartlomieju
Copy link
Member

Is there some kind of test you can add, the change to borrows of op state is fine, but adding change to microtask policy seems sus

@aapoalas
Copy link
Collaborator Author

aapoalas commented Jul 7, 2022

Is there some kind of test you can add, the change to borrows of op state is fine, but adding change to microtask policy seems sus

I could add a test that shows the inversion in logging without the policy change. Is that the kind of test you're looking for? I guess the same test could also be done for synchronous callbacks (arguably it already exists but an explicit one is always explicit) to show that this change does not affect them as claimed.

@bartlomieju
Copy link
Member

Is there some kind of test you can add, the change to borrows of op state is fine, but adding change to microtask policy seems sus

I could add a test that shows the inversion in logging without the policy change. Is that the kind of test you're looking for? I guess the same test could also be done for synchronous callbacks (arguably it already exists but an explicit one is always explicit) to show that this change does not affect them as claimed.

Sure, anything that exercises this code path is good 👍

@aapoalas
Copy link
Collaborator Author

aapoalas commented Jul 7, 2022

Sure, anything that exercises this code path is good +1

I added two tests that verify that, with the current changes, the order of logging is as one might expect.

Once the current round of tests is done I'll push a change to remove the microtask execution order to show that the order of logging becomes unexpected, and revert it later.

EDIT: Here's the unexpected logging order failure:

---- event_loop_integration stdout ----
ExitStatus(unix_wait_status(0))
thread 'event_loop_integration' panicked at 'assertion failed: `(left == right)`
  left: `"SYNCHRONOUS\nSync\nSTORED_FUNCTION called\nAsync\nTimeout\nTHREAD SAFE\nSync\nAsync\nSTORED_FUNCTION called\nTimeout\n"`,
 right: `"SYNCHRONOUS\nSync\nSTORED_FUNCTION called\nAsync\nTimeout\nTHREAD SAFE\nSync\nSTORED_FUNCTION called\nAsync\nTimeout\n"`', test_ffi/tests/integration_tests.rs:193:3
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    event_loop_integration

So, as can be teased out from the output, we expect (right) the order of logs to be:

THREAD SAFE
Sync
STORED_FUNCTION called
Async
Timeout

but it's actually (left)

THREAD SAFE
Sync
Async
STORED_FUNCTION called
Timeout

So we can indeed see that the Promise.resolve().then(() => console.log("Async")) call gets resolved before the callback returns. This can be quite unexpected.

@aapoalas
Copy link
Collaborator Author

aapoalas commented Jul 7, 2022

Hmmm... Seems like we get the unexpected logging order on ubuntu-20.04-xl... Gotta check tomorrow.

@aapoalas
Copy link
Collaborator Author

aapoalas commented Jul 8, 2022

Hmm... There's something odd with the release profile Linux build. I can reproduce the issue locally as well. It's as if the microtask policy setting wasn't working.

I'll remove that part from this PR and introduce a separate PR to enable the panic fix to move forward.

@aapoalas aapoalas force-pushed the fix-ext-ffi-jsruntimestate-double-borrow branch from 7528a37 to 1b3a91b Compare July 8, 2022 09:21
Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@bartlomieju bartlomieju merged commit 3da182b into denoland:main Jul 9, 2022
@aapoalas aapoalas deleted the fix-ext-ffi-jsruntimestate-double-borrow branch July 10, 2022 08:55
cjihrig pushed a commit that referenced this pull request Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants