Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c: Allow use of x86 segment register for linear memory access #2395

Merged
merged 1 commit into from
Jun 26, 2024

Conversation

shravanrn
Copy link
Collaborator

@shravanrn shravanrn commented Feb 19, 2024

This PR implements the "Segue" performance optimization in Wasm2c. Segue uses the x86 segment register to perform memory accesses to Wasm's linear heap. In modern OSes, only one of the two segment registers are used (to support thread-local storage), leaving the other free for our use here. In our earlier work, we found that this speeds up wasm2c by something like 5% to 11% although there are workloads that seem to benefit far more.

Full writeup from last year here. More detailed writeup in the works.

Since that initial writeup, the Wamr team has implemented a limited form of Segue (doesn't take full advantage of all the increased flexibility that comes with segment based encodings) in their compiler. See bytecodealliance/wasm-micro-runtime#2230. In direct conversations, they have also reported seeing performance benefits between 5% to 8%. I'll note that it is far easier for Wasm2c to take full advantage of Segue than Wamr (an artifact of Wasm2c's C-to-native pipeline).

Implementation
This is fairly straightforward in Wasm2c.

  1. Set the segment register via a userspace instruction (exposed by a compiler intrinsic) at the beginning of exported functions, and reset the register (in the off-chance that someone is actually using this register) prior to exit of the function. Also update the segment register in case of memory growth as they memory may have moved.
  2. Modify the parameter to the memcpy in DEFINE_LOAD and DEFINE_STORE to pass in a pointer whose type is a segment pointer (supported via named address namespaces)

The rest of the changes are just boilerplate on checking for support and correctly falling back if not supported.

Testing
I've added this to the testing for the rlbox version of the CI

Restrictions

  1. It seemed prudent to have this feature off by default since its brand new, and the performance benefits are workload dependent (Some pathological workloads do show minor performance regressions). Users must explicitly opt into this via a macro.

  2. For simplicity, Segue will only be enabled on Wasm modules that use a single (imported or internal) memory that is not shared (i.e., does not have multiple threads accessing it). Implementing Segue on shared memories would require atomic versions of segmented memory access which may require assembly code.

  3. Although, it is potentially possible to support this for all x86_64 compilers, I've restricted this to clang only for now due to some complexities (listed below) and lack of an immediate use case for other compilers (Firefox for now will probably enable this only on clang builds). Complexities for other compilers:

  • GCC: does support named address spaces, but its support is buggy. Further it doesn't support memcpy with a segment pointer. Thus GCC support would require the use of inline assembly for full support.
  • MSVC: supports segmented pointers but via different intrinsics which can give us limited support for Segue (similar to Wamr) and MAY require some inline assembly as well (I'm not yet sure)
  1. It is also potentially possible to support this for x86_32, however setting the segment register here requires a system call rather than a userspace instruction, making this slower/harder to efficiently implement within wasm2c.

EDIT:

  1. It is also possible to support this in Windows, however this is disabled for now since Windows clobbers the unused segment registers during syscalls and possibly also during context switches. Windows support can be investigated in a separate PR.

@sbc100
Copy link
Member

sbc100 commented Feb 20, 2024

Nice! Thanks for working on this.

I've never heard this called "Segue" optimization, is that a well known term? Perhaps the title would be better understood if it said something like "Use x86 segment register for memory access"?

I seem to remember from the NaCl days that we couldn't use the x86 segment register on x86_64... but maybe that was only for running x86_32 binaries on x86_64 OS? Is that right @dschuff?

@shravanrn
Copy link
Collaborator Author

shravanrn commented Feb 20, 2024

I've never heard this called "Segue" optimization, is that a well known term? Perhaps the title would be better understood if it said something like "Use x86 segment register for memory access"?

Ah this is a name we came up with as part of our writeup as an easy shorthand for this. I'm happy to update the title of the commit/PR to just say "x86 segments".

I seem to remember from the NaCl days that we couldn't use the x86 segment register on x86_64... but maybe that was only for running x86_32 binaries on x86_64 OS?

Sharing some details from my exploration below in case it helps, while we wait to hear from dschuff

It seems that a subset of the x86 segment register survived to x86_64. x86_64 got rid of the code segment cs and got rid of data segments ds, es, ss, but kept data segments fs, gs. x86_64 also got rid of the bounds checks on all of the segments' accesses.

From playing with NaCl's code base, I believe NaCl32 used

  • the code segment cs and its bounds checks to restrict executable code ranges and
  • the data segment ds and its bounds checks to restrict memory accesses
  • the stack segment ss and its bounds checks to restrict stack accesses

So I believe the exact approach of NaCl32 was no longer compatible in x86_64 since the segments and bounds checks NaCl32 used went away. (Not relevant here, but I have tested that NaCl32 works fine if you run a x86_32 binary in an x86_64 OS as CPUs and OSes remain backward compatible.)

NaCl64 in contrast uses an alternate approach relying on a combination of (1) emitting separate instructions to perform the memory-base addition (2) using guard pages (3) moving the stack & heap in a single contiguous memory region (4) code-pointer masking to implement a coarse-grain CFI to enforce sandboxing.

@shravanrn shravanrn changed the title wasm2c: Segue optimization for modules with a single unshared memory wasm2c: Use x86 segment register for linear memory for modules with a single unshared memory Feb 20, 2024
@shravanrn shravanrn changed the title wasm2c: Use x86 segment register for linear memory for modules with a single unshared memory wasm2c: Allow use of x86 segment register for linear memory access Feb 20, 2024
@dschuff
Copy link
Member

dschuff commented Feb 20, 2024

Yeah, my memory is that the original plan was just to use NaCl32 all the time even on x86-64 systems with 64-bit OS (until we started needing >4G of memory), but IIRC even that didn't work on Win64 because the OS was clobbering segment registers on context switches. So we (really, "they" - this was before my time in NaCl) had to design the x86-64 variant of NaCl sooner than they would have otherwise. And as you noted, no segment registers (although in some cases IIRC it actually does use fs or gs for TLS).

@shravanrn
Copy link
Collaborator Author

shravanrn commented Feb 22, 2024

@dschuff Ah gotcha, yup, i did a quick test and it looks like the unused segment register on windows gets clobbered during system calls --- which the tests in wasm2c don't really reveal, since these are mostly pure compute. I think there maybe a solution to reset the segment register after any call to external functions (like wasi-calls which may invoke syscalls). However, if this happens when the process is context switched out as you say, maybe this won't be sufficient either. In the interrim while i figure this out, i'll update this patch to be Linux only for now.

@shravanrn shravanrn marked this pull request as draft February 22, 2024 23:05
@shravanrn shravanrn marked this pull request as ready for review February 23, 2024 00:40
@shravanrn
Copy link
Collaborator Author

@sbc100 I've made the above change. Let me know your thoughts.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks surprisingly non-intrusive!

wasm2c/wasm-rt.h Show resolved Hide resolved
src/c-writer.cc Outdated Show resolved Hide resolved
src/template/wasm2c.declarations.c Show resolved Hide resolved
@keithw
Copy link
Member

keithw commented Feb 23, 2024

This looks like a cool optimization! Is there an available reference/documentation from Linux and glibc that we can point to that says the register is protected for use by userspace processes (notwithstanding scheduler interrupts, syscalls, signals, the threading implementation, etc.)?

I'm obviously a bit nervous about the maintenance and support burden of carrying a clang-only, Linux-only experimental feature upstream if we have a way to factor things out. Would you have interest in splitting this into the c-writer changes to enable this (which would go upstream), and creating an extensibility mechanism (maybe a command-line flag for a custom wasm2c.declarations.c?) to enable experimental features like these at runtime?

@shravanrn
Copy link
Collaborator Author

shravanrn commented Feb 23, 2024

(@sbc100 @keithw Oh one extra note from above... I was still validating this during my last post, so I didn't share earlier... but we do seem to be seeing one workload in Firefox eliminate 75% of the Wasm overhead due to this, which makes this particularly interesting to me. This speedup tends to be somewhat workload dependent though.)

! Is there an available reference/documentation ... that says the register is protected for use by userspace processes

Let me look into this. In practice, we have seen that this is never used even in large applications, but i'll see if I can find any documented reference.

Would you have interest in splitting this into the c-writer changes to enable this (which would go upstream), and creating an extensibility mechanism (maybe a command-line flag for a custom wasm2c.declarations.c?) to enable experimental features like these at runtime?

@keithw Sure. While my first choice would be to just have this supported behind suitable optional/experimental flags, I am happy as long as there is a practical way I can enable this. The only caveat being --- I really don't want to have to maintain any custom patches, as this would require me to keep this in sync with in-tree.

Happy to try out any technique you have in mind. Could you describe it in more detail? Right now I'm not sure how I can restrict the changes to c-writer.cc only as i have to modify the DEFINE_STORE and DEFINE_LOAD as well.

Alternately, if you want to create an --enable-experimental flag on wasm2c to enable this feature, I'm happy to do this.

I'm obviously a bit nervous about the maintenance and support burden of carrying a clang-only, Linux-only experimental feature upstream

Some additional context from my side: Fwiw, I am investigating how we can support other compilers and OS as well. I believe it should be possible to widen the support in future PRs.

@sbc100
Copy link
Member

sbc100 commented Feb 23, 2024

(@sbc100 @keithw Oh one extra note from above... I was still validating this during my last post, so I didn't share earlier... but we do seem to be seeing one workload in Firefox eliminate 75% of the Wasm overhead due to this, which makes this particularly interesting to me. This speedup tends to be somewhat workload dependent though.)

Wow! That is really awesome.

@shravanrn
Copy link
Collaborator Author

but i'll see if I can find any documented reference.

Quoting from https://www.kernel.org/doc/html/next/x86/x86_64/fsgs.html

The GS segment has no common use and can be used freely by applications. 
GCC and Clang support GS based addressing via address space identifiers.

@keithw Just wanted to bump the discussion so we can continue to find a path to land this. As mentioned above, we would really like to take advantage of the 75% overhead elimination soon.

@keithw
Copy link
Member

keithw commented Mar 6, 2024

but i'll see if I can find any documented reference.

Quoting from https://www.kernel.org/doc/html/next/x86/x86_64/fsgs.html

The GS segment has no common use and can be used freely by applications. 
GCC and Clang support GS based addressing via address space identifiers.

Cool, that's helpful!

@keithw Just wanted to bump the discussion so we can continue to find a path to land this. As mentioned above, we would really like to take advantage of the 75% overhead elimination soon.

I think I see a few good paths. Path one, this is a Linux+clang+Ivy-Bridge-and-later-only experiment that we're not supporting or documenting (yet) or worrying (yet) about nailing the corner cases (e.g. saving the register across calls to imported functions). If you want the experiment to be CI-tested upstream to avoid the burden of maintaining it out-of-tree, understandable, but it would be nice not to do that by adding more preprocessor macros and modes to the current declarations and having them live there indefinitely, because that also creates a burden. Maybe a middle ground is to declare this a time-limited experiment for n months? Or, another middle ground, the wasm2c tool take a command-line flag for an external declarations file (and c-writer gets minimally changed), you ship that file in an experiments directory, and it runs in the WABT CI to prevent bitrot?

Path two is, we're talking about adding OS/compiler/CPU-gen-specific intrinsics to the wasm2c output in a supported configuration (e.g. we're willing to have a CVE if this goes awry). The performance win sounds awesome, but I think we'd have to figure out the corner cases before we can document/support this. Also: if we're doing this for performance reasons, I think we'd want the benchmark results to be ones anybody can see in full form and reproduce -- you want the flexibility to keep improving the approach over time if somebody gets an idea (and you want new contributors to be on equal footing about whether their perf-motivated PRs are wins). Whatever that benchmark is, it would be nice if somebody can figure out how to match the performance in pure C. I realize GS is powerful because it can use segment-relative addressing, but maybe putting the mem0 address in a function parameter gives almost the same performance in practice, or maybe some form of LTO/PGO (suggested the last time this came up in #1805) narrows the gaps, etc. If LLVM or V8 took every optimization proposal where the proposer said it was a win for them in a particular workload, I think they would have painted themselves in a corner. :-/ Hope that gives a better sense of where I'm coming from.

Path three is, sell @sbc100 on this PR and don't worry about me.

@SoniEx2
Copy link
Contributor

SoniEx2 commented Mar 6, 2024

if we may chip in, we'd love to see support for external declarations!

@shravanrn
Copy link
Collaborator Author

Sorry for the delay. Got tied up with some other work.

@keithw Sure, I can investigate some of those paths, but I should provide some more context for some low-level details before proceeding

worrying (yet) about nailing the corner cases (e.g. saving the register across calls to imported functions).

I believe this should already work. The imported function is invoked via its external interface vs. the module's internal implementation of the function in a static function. The external interface of all functions save the current value of %gs, load the current module's memory into %gs, and then restore the original %gs once complete. This bakes in the assumption, that each module has its own memory. If a module has more than one memory, segue is not used for that module and %gs is left undisturbed. I'm hoping that this should work fine in all scenarios: when one module calls another and one of the two modules, or both modules, have a single or multiple memories, and the tests do pass. Let me know if you believe I had missed something.

I realize GS is powerful because it can use segment-relative addressing, but maybe putting the mem0 address in a function parameter gives almost the same performance in practice

Unfortunately, this does not seem to be the case. While it is true that putting the mem0 address in a function parameter may help, this would in practice only reduce one memory dereference per function, which is not really going to add up to much. Rather the performance of this optimization comes from a few different places

  • This has the effect of pinning a register for the mem0 address, which is otherwise difficult to do in today's C compilers without modifying them.
  • Additionally, even if we could somehow pin a GPR vs. the segment register (i.e., we put mem0 in a function parameter and somehow instruct the compiler to never spill that function parameter register), this would take a register away from the GPR pool which would reduce performance on compute heavy code. The advantage here is that we are using an otherwise unused register %gs.
  • Segment relative addressing actually frees up an additional operand slot normally occupied by the memory base which makes instruction generation more compact as a whole. See Figure 1c here for some examples. But in short, the backend can now reduce the number of instructions needed to compute an effective address, can perform 32-bit truncation during the addition of a 64-bit base (using operand override prefix) etc.
  • This optimization also seems to reduce the size of the binaries produced by wasm2c in the SPEC 06 benchmark by 6% to 7% which has an indirect perf affect via a smaller icache footprint.

Match the performance in pure C ... maybe some form of LTO/PGO

The 75% number comes from a Wasm sandboxed graphite font rendering workload in Firefox, and LTO is enabled in Firefox

If LLVM or V8 took every optimization proposal where the proposer said it was a win for them in a particular workload, I think they would have painted themselves in a corner

Let me push back on this a bit. I think this is entirely reasonable argument to make this argument for a 1% to 2% optimization (indeed I made this argument here when discussing only pinned registers). However, I think the substantial performance boost here changes the equation. More concretely, if elimination of 75% of Wasm's overhead is not good enough to make the cut in compilers, then I think LLVM/V8 would be a hundredth of their current size, if not even smaller. I think the reason to consider this optimization is precisely the outsized impact it has on certain workloads on a large proportion of CPUs actually in use today. Additionally the baseline performance boost of about 5% to 11% (this translates to roughly eliminating 44.7% of overhead on SPEC 06) is pretty substantial even by itself.

the wasm2c tool take a command-line flag for an external declarations file

I think we'd want the benchmark results to be ones anybody can see in full form and reproduce

Sure, let me investigate these options further.

@keithw
Copy link
Member

keithw commented Mar 19, 2024

I believe this should already work. The imported function is invoked via its external interface vs. the module's internal implementation of the function in a static function. The external interface of all functions save the current value of %gs, load the current module's memory into %gs, and then restore the original %gs once complete. This bakes in the assumption, that each module has its own memory. If a module has more than one memory, segue is not used for that module and %gs is left undisturbed. I'm hoping that this should work fine in all scenarios: when one module calls another and one of the two modules, or both modules, have a single or multiple memories, and the tests do pass. Let me know if you believe I had missed something.

That works for imports from other wasm2c-generated modules, but what should we do about imports from the host?

  • Option A seems like "prominently document that all host functions have to be wrapped in something that saves and restores this register, modify our examples to do this, and use asserts in the wasm2c-generated code to enforce (in debug builds) that calls to imports leave the register unchanged."
  • Option B might be "enforce the new calling convention by prepending wasm2c_seque_ to the name of every import and export, and require that if a module OR the host wants to advertise that its export honors this calling convention, it has to explicitly name their export that way, or else the wasm2c-generated module synthesizes a weak wrapper that does the register saving for it (similar to what happens for tail calls)."

Both of these seem like they have some downsides; maybe there's a better option C?

@shravanrn
Copy link
Collaborator Author

Sorry for the long delay here. I had to deal with some production bugs in Firefox in March and got distracted with paper submissions and reviews. I am now back looking at this. Will go through the above thread to recapture all the context and will follow up shortly.

@shravanrn
Copy link
Collaborator Author

shravanrn commented May 13, 2024

Option A seems like "prominently document that all host functions have to be wrapped in something that saves and restores this register, modify our examples to do this, and use asserts in the wasm2c-generated code to enforce (in debug builds) that calls to imports leave the register unchanged."

@keithw For this option, I think we can document it as you mention, but we should require no other change in the host code, since no examples would use %gs (it is an unused register and gcc and clang do not emit code that use it when compiling normal C). Wamr has also been shipping this as the default option for a while and doesn't seem to have run into any issues. I am happy to include some asserts on debug builds if this helps build confidence.

Option B might be "enforce the new calling convention by prepending wasm2c_seque_ to the name of every import and export

This seems like overkill to me, especially given that this is not a feature that would be on-by-default.

I propose we look at Option A, and I will include documentation and asserts for the same.

@shravanrn
Copy link
Collaborator Author

@keithw I have pushed a version addressing the comments above. It includes documentation, an extra mode with more sanity checks which we use in testing, and a benchmark (Dhrystone) to measure the performance difference (On my machine the use of segue makes dhrystone run in about 60% of the time as without segue) .

@keithw @sbc100 Could you have a look when you get the chance?

@shravanrn shravanrn force-pushed the segue branch 2 times, most recently from dcf0db3 to 65ca36b Compare May 16, 2024 03:14
@shravanrn
Copy link
Collaborator Author

@keithw @sbc100 Just bumping this in your inbox

@shravanrn shravanrn requested a review from sbc100 May 23, 2024 16:07
@ivanmalikov394
Copy link

This PR implements the "Segue" performance optimization in Wasm2c. Segue uses the x86 segment register to perform memory accesses to Wasm's linear heap. In modern OSes, only one of the two segment registers are used (to support thread-local storage), leaving the other free for our use here. In our earlier work, we found that this speeds up wasm2c by something like 5% to 11% although there are workloads that seem to benefit far more.

Full writeup from last year here. More detailed writeup in the works.

Since that initial writeup, the Wamr team has implemented a limited form of Segue (doesn't take full advantage of all the increased flexibility that comes with segment based encodings) in their compiler. See bytecodealliance/wasm-micro-runtime#2230. In direct conversations, they have also reported seeing performance benefits between 5% to 8%. I'll note that it is far easier for Wasm2c to take full advantage of Segue than Wamr (an artifact of Wasm2c's C-to-native pipeline).

Implementation This is fairly straightforward in Wasm2c.

  1. Set the segment register via a userspace instruction (exposed by a compiler intrinsic) at the beginning of exported functions, and reset the register (in the off-chance that someone is actually using this register) prior to exit of the function. Also update the segment register in case of memory growth as they memory may have moved.
  2. Modify the parameter to the memcpy in DEFINE_LOAD and DEFINE_STORE to pass in a pointer whose type is a segment pointer (supported via named address namespaces)

The rest of the changes are just boilerplate on checking for support and correctly falling back if not supported.

Testing I've added this to the testing for the rlbox version of the CI

Restrictions

  1. It seemed prudent to have this feature off by default since its brand new, and the performance benefits are workload dependent (Some pathological workloads do show minor performance regressions). Users must explicitly opt into this via a macro.
  2. For simplicity, Segue will only be enabled on Wasm modules that use a single (imported or internal) memory that is not shared (i.e., does not have multiple threads accessing it). Implementing Segue on shared memories would require atomic versions of segmented memory access which may require assembly code.
  3. Although, it is potentially possible to support this for all x86_64 compilers, I've restricted this to clang only for now due to some complexities (listed below) and lack of an immediate use case for other compilers (Firefox for now will probably enable this only on clang builds). Complexities for other compilers:
  • GCC: does support named address spaces, but its support is buggy. Further it doesn't support memcpy with a segment pointer. Thus GCC support would require the use of inline assembly for full support.
  • MSVC: supports segmented pointers but via different intrinsics which can give us limited support for Segue (similar to Wamr) and MAY require some inline assembly as well (I'm not yet sure)
  1. It is also potentially possible to support this for x86_32, however setting the segment register here requires a system call rather than a userspace instruction, making this slower/harder to efficiently implement within wasm2c.

EDIT:

  1. It is also possible to support this in Windows, however this is disabled for now since Windows clobbers the unused segment registers during syscalls and possibly also during context switches. Windows support can be investigated in a separate PR.

thaks

@shravanrn
Copy link
Collaborator Author

@keithw @sbc100 Just bumping this in your inbox

Bumping this again. @sbc100 @keithw any chance you can code review + sign off? We are hoping to land this so that we can adopt it in Firefox.

@shravanrn
Copy link
Collaborator Author

@sbc100 @keithw Bumping this again. Please have a look when you have a chance.

@sbc100
Copy link
Member

sbc100 commented Jun 18, 2024

Apologies @shravanrn, I was away at the CG meeting and I'm out sick this week. Lets try to get this landed somehow next week.

@shravanrn
Copy link
Collaborator Author

shravanrn commented Jun 19, 2024

Apologies @shravanrn, I was away at the CG meeting and I'm out sick this week. Lets try to get this landed somehow next week.

@sbc100 ah gotcha, next week sounds good! Hope you feel better soon :)

@shravanrn
Copy link
Collaborator Author

@sbc100 Following up on the last message. Let me know how we can go forward here.

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % a few comments

@@ -3725,6 +3782,9 @@ void CWriter::Write(const ExprList& exprs) {
Write(StackVar(0), " = ", func, "(",
ExternalInstancePtr(ModuleFieldType::Memory, memory->name), ", ",
StackVar(0), ");", Newline());
if (IsSingleUnsharedMemory()) {
InstallSegueBase(module_->memories[0], false /* save_old_value */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if its worth coming up with a descriptive name for this? Segue seems a little bit like a "TM" name. I'm not sure I can come up with better name though. Maybe this is fine as is?

Copy link
Collaborator Author

@shravanrn shravanrn Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong view on this.

Wamr seems to have shipped it with the Segue name https://github.com/bytecodealliance/wasm-micro-runtime/releases/tag/WAMR-1.2.3

I'm leaning towards leaving it as is for consistency, but can change this to Segment if you prefer.

src/c-writer.cc Show resolved Hide resolved
wasm2c/examples/fac/fac.c Show resolved Hide resolved
wasm2c/examples/segue/dhrystone_src/dhry.h Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to checkin this binary file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern in the wasm2c/examples folder which seems to have checked in a wat file. I figured checking in the wasm file will allow anyone without a wasi-clang compiler to still run tests. Happy to remove it if you prefer.

wasm2c/examples/segue/dhrystone_src/dhry.h Outdated Show resolved Hide resolved
wasm2c/examples/segue/main.c Outdated Show resolved Hide resolved
Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have WASM_RT_SANITY_CHECKS be linked the NDEBUG macro by default? That way it would have sensible default and most folks would never need to set it explicitly?

@shravanrn
Copy link
Collaborator Author

shravanrn commented Jun 26, 2024

Should we have WASM_RT_SANITY_CHECKS be linked the NDEBUG macro by default? That way it would have sensible default and most folks would never need to set it explicitly?

@sbc100 I considered this and unfortunately this doesn't work. Here is my reasoning

  • The sanity checks are quite slow and the spirit of these checks are more similar to ASAN/MSAN checks, than runtime asserts that we would keep in release builds.
  • Most compilers do not specify NDEBUG on release builds, and thus checks that execute under the guard #ifndef NDEBUG are usually enabled-by-default including on release builds. Thus if a user enables Segue, but forgets to add -DNDEBUG would actually be slowing down their code, which I think would be confusing.
  • Given this, I've opted to put this under a separate sanity check.

So I think we may have to leave this as is. Please let me know if you have an alternate suggestion.

@shravanrn
Copy link
Collaborator Author

@sbc100 I have pushed a fix to address the comments, modulo a couple of comments where I need some extra input. If it looks good, I can land it once you approve the PR.

@sbc100 sbc100 merged commit 0e871af into WebAssembly:main Jun 26, 2024
18 checks passed
@shravanrn shravanrn deleted the segue branch June 26, 2024 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants