Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c: Cleanup/separate code for heap and stack signal handlers #2354

Merged
merged 1 commit into from
Dec 15, 2023

Conversation

shravanrn
Copy link
Collaborator

@shravanrn shravanrn commented Dec 14, 2023

This PR focuses on code cleanup for the signal handler detecting stack overflows. This PR earlier included an option to disable stack count checking, this has now been split into its own PR.

@keithw
Copy link
Member

keithw commented Dec 14, 2023

Hmm, my understanding was that every (conforming) embedder needs either:

  1. stack-depth counting to detect stack exhaustion and trap, or
  2. some other way to detect and trap on stack exhaustion (e.g., a segfault signal handler and an altstack to run it on)

And I thought strategy 2 basically just wasn't implemented for Windows (yet?), e.g. in wasm-rt.h we have:

#ifndef WASM_RT_USE_STACK_DEPTH_COUNT
/* The signal handler on POSIX can detect call stack overflows. On windows, or
 * platforms without a signal handler, we use stack depth counting. */
#if WASM_RT_INSTALL_SIGNAL_HANDLER && !defined(_WIN32)
#define WASM_RT_USE_STACK_DEPTH_COUNT 0
#else
#define WASM_RT_USE_STACK_DEPTH_COUNT 1
#endif
#endif

@shravanrn
Copy link
Collaborator Author

shravanrn commented Dec 14, 2023

Hmm, my understanding was that every (conforming) embedder needs either:

1. stack-depth counting to detect stack exhaustion and trap, or

2. some other way to detect and trap on stack exhaustion (e.g., a segfault signal handler and an altstack to run it on)

And I thought strategy 2 basically just wasn't implemented for Windows (yet?).

Yup, in general this is right. Unfortunately, for the moment, RLBox picks option 3 --- it uses neither "stack signals" nor "stack depth counting" on Windows. This is because the performance cost (when we last measured) of stack depth counting was not justifiable, and we are comfortable with this security trade-off given the domain.

That said, I do want to keep this simple from the perspective of wasm2c: so here is the model I am going for. If the host says don't use the internal "stack signals" code AND don't use "stack depth counting", wasm2c can assume the host is conforming and has some way to detect stack overflows (perhaps through its a signal handler similar to the POSIX approach) and so wasm2c does not need to do anything further to protect from stack exhaustion

@shravanrn
Copy link
Collaborator Author

@keithw @sbc100 Let me know if you have any other questions. I am eager to land this soon, as the current wasm2c build breaks for RLBox on windows.

@keithw
Copy link
Member

keithw commented Dec 14, 2023

I guess it sounds like RLBox/Firefox has made a considered judgment to run wasm2c output on Windows in a way that trades off the ability to catch stack exhaustion for better performance. The reason the rlbox test failed in this PR initially is because the proposed config doesn't pass the tests. (And I guess the new commit doesn't test the proposed configuration? Which I guess means the rlbox CI test is different from the configuration RLBox uses in real life?)

This is the exact situation I was trying to avoid when I described the change at #2332 (comment) and was like "Hopefully that's acceptable to Firefox," but I don't think I was clear enough. :-( At this point, I think the discussion probably needs to be about "how can WABT accommodate this slightly-nonconforming use case," not like this is a bug fix.

@keithw
Copy link
Member

keithw commented Dec 14, 2023

Some options seem like:

  • emphasize that the embedder is free to implement the wasm_rt_is_initialized() function however it wants, including in nonconforming ways -- wasm2c leaves that as one of the "collection of function declarations that must be implemented by the embedder (i.e. you) before this C source can be used." The wasm-rt-impl.c file gives a conforming C implementation but it's not mandatory to use as-is.
  • we improve the implementation in wasm-rt-impl.c to use SEH on Windows to achieve zero-cost handling of stack exhaustion that passes the tests (similar to what happens on Mac or Linux)
  • wasm-rt-impl provides a preprocessor macro that's like WASM_RT_NONCONFORMING_something that explicitly signals that it's forcing a nonconforming mode, and we'd modify the "rlbox" CI test so that it tests this configuration but maybe on Windows it builds but skips some or all of the tests. There was a strongly expressed view in Add some optional defines to disable wasm2c trap checks #1432 that we shouldn't do this kind of thing, but obviously it's an option. If we do it, I'd prefer we do it explicitly.

@keithw
Copy link
Member

keithw commented Dec 14, 2023

(BTW, on the topic of Wasm-nonconforming hooks that RLBox/Firefox probably wants for performance reasons, I think you'd benefit hugely by eliminating FORCE_READ_INT / FORCE_READ_FLOAT / SIMD_FORCE_READ, as would we. Zach Yedidia has measured these as contributing a large hit across the SPEC CPU benchmarks. Maybe this is another opportunity for WASM_RT_NONCONFORMING_something if we're opening that door.)

@shravanrn
Copy link
Collaborator Author

shravanrn commented Dec 14, 2023

We can definitely make this more explicit. I was thinking I could add a third stack handling macro WASM_RT_STACK_HOST_CHECKS or something similar to signify that this is the hosts problem more clearly. Thoughts?

At this point, I think the discussion probably needs to be about "how can WABT accommodate this slightly-nonconforming use case," not like this is a bug fix.

I am wary of making this more complicated than it is. I think the path forward is simple: wasm2c can assume the host has done something like use SEH on Windows to achieve zero-cost handling of stack exhaustion but doesn't have to test this.

If we do really want to test the rlbox use case as part of CI, I can write a CI test to pull rlbox, and check if wasm2c passes all the rlbox test suite in the wasm2c CI. I don't have a strong opinion about this one way or the other, but am happy to do this if it makes it simpler for all.

I do want to get away from the idea that testing wasm2c in the exact configuration used by RLBox would pass all of Wasm's tests --- I think this is setting us up for doing a bunch of work, that no one is asking for, and will not used by anyone.

think you'd benefit hugely by eliminating FORCE_READ_INT / FORCE_READ_FLOAT / SIMD_FORCE_READ, as would we. Zach Yedidia has measured these as contributing a large hit across the SPEC CPU benchmarks.

Interesting. I think I saw something similar in my tests about a year ago. Any chance you know if this was despite Wasm being compiled with O3, and what the ballpark perf improvement is? One reason I haven't already done this though, is that in one of the times I removed FORCE_READ* (along with a number of other changes) and played with SPEC performance, I ran into some weird undefined behavior. I'm not sure if this was due to removing FORCE_READ or due to other unrelated changes I was testing, but I've left things as is, out of an abundance of caution. I'll definitely revisit this in the upcoming weeks.

@sbc100
Copy link
Member

sbc100 commented Dec 14, 2023

Can we just test that the rlbox configuration builds (but not actually check that it passes all the test).

@shravanrn
Copy link
Collaborator Author

Can we just test that the rlbox configuration builds (but not actually check that it passes all the test).

@sbc100 Yup! I was already working on a wasm2c_rlbox_windows build to the CI, and was also going to run the test suite with some exclusions for tests we know will fail. To clarify my earlier message, I just wanted emphasize that we will definitely need exclusions. If you think I should remove the running of tests altogether in the wasm2c_rlbox builds (in case the CI is taking too long), let me know, and I can do this.

@shravanrn
Copy link
Collaborator Author

@sbc100 Done!

@sbc100 @keithw Let me know if the latest commits look good

@keithw
Copy link
Member

keithw commented Dec 14, 2023

We can definitely make this more explicit. I was thinking I could add a third stack handling macro WASM_RT_STACK_HOST_CHECKS or something similar to signify that this is the hosts problem more clearly. Thoughts?

I am wary of making this more complicated than it is. I think the path forward is simple: wasm2c can assume the host has done something like use SEH on Windows to achieve zero-cost handling of stack exhaustion but doesn't have to test this.

I guess I think the reasonable options are what I tried to outline.

  • wasm2c (the transpiler and its generated code) doesn't assume anything about what the host will do if WASM_RT_USE_STACK_DEPTH_COUNT if off. The embedder can do whatever it wants, including nonconforming things.

  • wasm-rt-impl (our non-mandatory implementation of the embedder-side runtime) basically has two modes -- one where the host is responsible for catching signals (WASM_RT_SKIP_SIGNAL_RECOVERY), and one where it catches the signal. Both assume conforming behavior, which means that wasm_rt_is_initialized (called only in debug code) has a safety check that the runtime is ready to catch a stack exhaustion segfault (unless stack-depth counting is enabled).

  • If Firefox/RLBox wants to add a hook to wasm-rt-impl that skips this safety check on purpose, I'm not as philosophically opposed as others in Add some optional defines to disable wasm2c trap checks #1432, but it should be explicitly marked "nonconforming" so we know what we're getting into and why it's there. WASM_RT_STACK_HOST_CHECKS just sounds like yet another confusing mode but it's not clear why it's there.

  • I think a lot of my confusion over the last while probably stems from the fact that I don't understand why Firefox needs/wants to use wasm-rt-impl unchanged from upstream WABT. Firefox has a lot of unique needs here (as befitting a gigantic and sophisticated application), and there's an abstraction boundary for a reason, so... why not take advantage of it? Even our tiny research project has its own implementation of the runtime. :-)

@shravanrn
Copy link
Collaborator Author

shravanrn commented Dec 14, 2023

I think a lot of my confusion over the last while probably stems from the fact that I don't understand why Firefox needs/wants to use wasm-rt-impl unchanged from upstream WABT. Firefox has a lot of unique needs here (as befitting a gigantic and sophisticated application), and there's an abstraction boundary for a reason, so... why not take advantage of it? Even our tiny research project has its own implementation of the runtime.

@keithw: I am tired of relitigating this again and again and again and again, for every small change. Putting a smiley face is not an excuse to constantly bring this up. I am not asking you to understand the Firefox use case, nor am I asking you to do any work apart from stopping this active gatekeeping which is doing nobody any favors.

You asked for tests, I've added it. You asked for an explicit macro, I've added it (which we can rename/bikeshed). But, I am not making any further changes unless they are justified. If you have issues, I am happy to fork this repo once and for all.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Dec 14, 2023

@keithw counter-proposal:

move wasm-rt-impl into tests.

(or at least into examples)

@keithw
Copy link
Member

keithw commented Dec 15, 2023

I'm not looking for a fight. I'm sorry we didn't head this off a few weeks when I tried to flag these changes; this is what I was worried about. If Firefox wants wasm-rt-impl to add a hook that skips the safety check in wasm_rt_is_initialized() to permit nonconforming behavior (which is part of this PR along with other refactoring), okay. (I'll probably follow up with another nonconforming hook to omit FORCE_READ_x.) But then it should be in a discrete PR and marked NONCONFORMING or something; afaik it'll be the first time we've done something like this.

@keithw
Copy link
Member

keithw commented Dec 15, 2023

think you'd benefit hugely by eliminating FORCE_READ_INT / FORCE_READ_FLOAT / SIMD_FORCE_READ, as would we. Zach Yedidia has measured these as contributing a large hit across the SPEC CPU benchmarks.

Interesting. I think I saw something similar in my tests about a year ago. Any chance you know if this was despite Wasm being compiled with O3, and what the ballpark perf improvement is?

Let me get more detail... I don't know the exact compiler settings, although I know he used binaryen's wasm-opt over the Wasm before feeding it to wasm2c and had simd128 enabled. I assume he used at least -O2 (both for the initial clang into Wasm and then for compiling the wasm2c output) but don't know for sure. The numbers were a 16% geomean slowdown over native compilation when FORCE_READ_x is removed (with wide variation across individual benchmarks) vs. a 29% slowdown using wasm2c as-is.

@shravanrn
Copy link
Collaborator Author

@keithw Sure, I have pushed a commit removing the non conforming bits from this PR. I will submit a follow up PR once this has landed.

@shravanrn shravanrn force-pushed the osaltstack_winfix branch 2 times, most recently from fc980cf to f80fa38 Compare December 15, 2023 02:10
wasm2c/wasm-rt.h Outdated Show resolved Hide resolved
Copy link
Member

@keithw keithw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

@shravanrn shravanrn merged commit 80b9752 into WebAssembly:main Dec 15, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants