Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code unexpectedly goes out of bounds with any level of optimisation #419

Open
infinitesnow opened this issue May 13, 2024 · 8 comments
Open

Comments

@infinitesnow
Copy link

infinitesnow commented May 13, 2024

I compile the following code:

#include <vector>
#include <stdint.h>

__attribute__((import_module("env"), import_name("jstest")))  void jstest(int32_t);

extern "C" {
    __attribute__((export_name("edit"))) void edit(char* arr, const uint8_t SIZE) {
        for (uint32_t i = 0; i < SIZE; i++) {
            arr[i] = 5;
        }

        std::vector<unsigned> myvec = {0xDEADBEEF,SIZE,222};
        myvec.resize(SIZE); // Any value >3, its current size -> Array out of bound access in any browser I tried

        jstest(myvec[0]);
        jstest(myvec[1]);
    }

}

From the browser:


let jstest = function(a:Number) { 
  console.log(`Got: ${a}`)
}

async function getWasm() {
  return WebAssembly.instantiateStreaming(fetch("wasm/main.wasm"), { env: { jstest: jstest}  } ).then(
    ({module, instance}) =>  instance.exports
  );
}

window.addEventListener("load", () => {
    getWasm().then( exports => {
        const MAX_SIZE = 10
        let data =  new Uint8Array((exports.memory as WebAssembly.Memory).buffer,0,MAX_SIZE)

        let edit = exports.edit as CallableFunction;

        console.log("before",data)

        for ( let i=0; i<MAX_SIZE; i++ ) {
            data[i] = i
        }
        edit(data,5);

        console.log("after",data)
      })

Stack reported by Chrome:

main.wasm:0x84d Uncaught (in promise) RuntimeError: memory access out of bounds
    at main.wasm.dlmalloc (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[23]:0x84d)
    at main.wasm.malloc (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[22]:0x5d6)
    at main.wasm.operator new(unsigned long) (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[20]:0x58a)
    at main.wasm.std::__2::allocator<unsigned int>::allocate[abi:nn180100](unsigned long) (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[8]:0x340)
    at main.wasm.std::__2::vector<unsigned int, std::__2::allocator<unsigned int>>::__vallocate[abi:nn180100](unsigned long) (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[5]:0x244)
    at main.wasm.std::__2::vector<unsigned int, std::__2::allocator<unsigned int>>::vector[abi:nn180100](std::initializer_list<unsigned int>) (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[2]:0x180)
    at main.wasm.edit (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[1]:0x103)
    at main.wasm.edit.command_export (http:https://localhost:5500/www/wasm/main.wasm:wasm-function[32]:0x2b0c)
    at http:https://localhost:5500/www/js/main.js:110:13

I build with:

/opt/wasi-sdk/bin/clang++ --sysroot /opt/wasi-sdk/share/wasi-sysroot -I/opt/wasi-sdk/share/wasi-sysroot/include/wasm32-wasi/c++/v1 -I/opt/wasi-sdk/share/wasi-sysroot/include/wasm32-wasi -L/opt/wasi-sdk/share/wasi-sysroot/lib/wasm32-wasi -o www/wasm/main.wasm main.cpp -fno-exceptions -Wl,--no-entry -Oz -nostartfiles -Wl,--max-memory=13107200

Changing -O* to -O0 fixes the issue.

I built the SDK myself (on a WSL and a macOS host) with a small LLVM patch to allow baremetal code to run in the browser without runtime / imports:

diff --git a/libcxx/include/__availability b/libcxx/include/__availability
index b8b2da9bb122..a1fbf8cfe3f7 100644
--- a/libcxx/include/__availability
+++ b/libcxx/include/__availability
@@ -128,7 +128,7 @@
 // This controls whether the library claims to provide a default verbose
 // termination function, and consequently whether the headers will try
 // to use it when the mechanism isn't overriden at compile-time.
-#  define _LIBCPP_AVAILABILITY_HAS_VERBOSE_ABORT 1
+#  define _LIBCPP_AVAILABILITY_HAS_VERBOSE_ABORT 0
 #  define _LIBCPP_AVAILABILITY_VERBOSE_ABORT

 // This controls the availability of the C++17 std::pmr library,
diff --git a/libcxxabi/src/abort_message.cpp b/libcxxabi/src/abort_message.cpp
index 859a5031b93f..49a9cbb3e71e 100644
--- a/libcxxabi/src/abort_message.cpp
+++ b/libcxxabi/src/abort_message.cpp
@@ -31,7 +31,7 @@ void abort_message(const char* format, ...)
     // Write message to stderr. We do this before formatting into a
     // variable-size buffer so that we still get some information if
     // formatting into the variable-sized buffer fails.
-#if !defined(NDEBUG) || !defined(LIBCXXABI_BAREMETAL)
+#if 0
     {
         fprintf(stderr, "libc++abi: ");
         va_list list;

Attaching reproducers (good -O0/bad -Oz)
reproducers.tar.gz

Might be an LLVM issue, or I might be doing something wrong. Thought I'd start from reporting here.

@infinitesnow infinitesnow changed the title Code unexpectedly goes out of bound with any level of optimisation Code unexpectedly goes out of bounds with any level of optimisation May 13, 2024
@sbc100
Copy link
Member

sbc100 commented May 13, 2024

Your JS code looks like it is setting the first 10 bytes of memory, which is almost certainly not what you want to be doing.

You need to find a region of memory that is available to write to. The easiest way to do that is to call the malloc export and write into the resulting buffer.

Furthermost you cannot pass data to to the edit functions since data is a JS object (a typed array) and edit takes a pointer (i.e. a Number).

Do you would want to do something like this:

var ptr = exports.malloc(MAX_SIZE);
for ( let i=0; i<MAX_SIZE; i++ ) {
  data[ptr + i] = ...
}
exports.edit(ptr,5);
exports.free(ptr);

@infinitesnow
Copy link
Author

infinitesnow commented May 13, 2024 via email

@sbc100
Copy link
Member

sbc100 commented May 13, 2024

Yes, you should call malloc to get your memory region. It doesn't matter if the memory is imported or exported, both sides (Wasm and JS) still need to agree on the memory layout.

I'm not quite sure what you mean by the second half of your question, but all static data needs to be resolved by the linker, so it can do that memory layout. You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

It is possible to export a static char buffer is that works for you. e.g:

char myBuf[1024];

Then use -Wl,--export=myBuf to export it and access that address from exports.myBuf. That would avoid calling malloc.

@infinitesnow
Copy link
Author

infinitesnow commented May 13, 2024

You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

If that's the case, then I don't understand how memory can be imported. My understanding is that an imported function is dynamically loaded. I was under the impression that -import-memory would give me a relocatable symbol that I could access in code?

That sounded reasonable, I'd just create a few object of variable size in JS, pass them at instantiation time, then from C++ I could just treat those as normal .data memory. Is that not what imported memory is?

@sbc100
Copy link
Member

sbc100 commented May 13, 2024

You can't have static data that the linker does know about (at least not without using dyanmic linking which is probably not what you want).

If that's the case, then I don't understand how memory can be imported. My understanding is that an imported function is dynamically loaded. I was under the impression that -import-memory would give me a relocatable symbol that I could access in code?

That sounded reasonable, I'd just create a few object of variable size in JS, pass them at instantiation time, then from C++ I could just treat those as normal .data memory. Is that not what imported memory is?

In the C/C++ there is just one memory. You can import it or export, but its layout will be determined statically at link time by wasm-ld. The layout of the memory is something like |zero page|static data|stack|heap...|. With static linking you cannot alter this layout after linking except the grow the heap on the right hand side as the memory itself grows.

@infinitesnow
Copy link
Author

infinitesnow commented May 13, 2024

In C/C++ there is just one memory.

Yes, this would be my assumption. But if that's the case, I don't understand, what's the difference between exporting and importing memory in wasm? Wouldn't that effectively accomplish the same thing since it's one and the same block, regardless of which direction it is exchanged?

Maybe my use case is too simple, is there a use case where clearly one approach is better?

@sbc100
Copy link
Member

sbc100 commented May 13, 2024

In C/C++ there is just one memory.

Yes, this would be my assumption. But if that's the case, I don't understand, what's the difference between exporting and importing memory in wasm? Wouldn't that effectively be the same thing?

Not a huge amount is different.

In one scenario, the memory (and its size limits) are statically baked into the wasm file. In the other scenario you must call new WebAssembly.Memory from JS. This gives you a little more flexibility at the cost of an extra couple of lines of JS code. I recommend sticking with the exported memory to keep the JS code simpler/smaller.

@infinitesnow
Copy link
Author

infinitesnow commented May 13, 2024

Cool, thanks for the help.

I reckon importing would also allow you to save / restore the state of a module. I'll try the malloc implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants