Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LTO failing to prune WASI fd_* imports #401

Open
ggoodman opened this issue Mar 22, 2024 · 8 comments
Open

LTO failing to prune WASI fd_* imports #401

ggoodman opened this issue Mar 22, 2024 · 8 comments

Comments

@ggoodman
Copy link

ggoodman commented Mar 22, 2024

In this repo, I've shown two scenarios where code that is functionally identical produces very different WASM binaries.

Setup

The two scenarios I'm testing are for a rust cdylib whose code looks like this:

include!(concat!(env!("OUT_DIR"), "/bindings.rs"));

#[export_name = "fallible_func"]
pub extern "C" fn fallible_func(val: i32) -> i32 {
    unsafe { Fallible_func(val) }
}

#[cfg(feature = "print")]
#[export_name = "print"]
pub extern "C" fn print(str: *const i8) {
    unsafe {
        // Call the FFI Print function with the a pointer to the C string `cstr`.
        Print(str as *mut i8);
    };
}

Importantly, the print function is guarded by the print feature that is off by default.

The c code we're binding looks like this:

#include <stdio.h>
#include <assert.h>

int Fallible_func(int value)
{
  assert(value > 0);
  return value;
}

void Print(char *str)
{
  printf("%s\n", str);
}

If we run the cargo build with NO_PRINTF=1, then the c code instead looks like this:

#include <stdio.h>
#include <assert.h>

int Fallible_func(int value)
{
  assert(value > 0);
  return value;
}

void Print(char *str)
{
  // printf("%s\n", str);
}

Scenario 1:

Note: This is all imports and exports.

  (import "wasi_snapshot_preview1" "fd_close" (func $__imported_wasi_snapshot_preview1_fd_close (;0;) (type 2)))
  (import "wasi_snapshot_preview1" "fd_fdstat_get" (func $__imported_wasi_snapshot_preview1_fd_fdstat_get (;1;) (type 3)))
  (import "wasi_snapshot_preview1" "fd_seek" (func $__imported_wasi_snapshot_preview1_fd_seek (;2;) (type 4)))
  (import "wasi_snapshot_preview1" "fd_write" (func $__imported_wasi_snapshot_preview1_fd_write (;3;) (type 5)))
  (export "memory" (memory 0))
  (export "fallible_func" (func $fallible_func.command_export))

WASM binary size: 2.4K

Scenario 2: NO_PRINTF=1

Note: This is all imports and exports. No imports at all here.

  (export "memory" (memory 0))
  (export "fallible_func" (func $fallible_func.command_export))

WASM binary size: 459B

Conclusion

This might be totally expected given the rust compilation pipeline and the way wasi-libc is authored and structured, but is quite surprising to me.

I was expecting that link-time optimizations would be 'smart' enough to notice that printf was not referenced by any exported functions in the rust library and recursively pruned. I've put together the minimal repro in hopes that it helps illustrate the observations in a reproducible way and could be helpful if this behaviour is not as intended.

@sbc100
Copy link
Member

sbc100 commented Mar 22, 2024

The term LTO normally refers to a specific thing which -flto which means compilation/optimization at link time.

I believe what you are talking about here is normal linker DCE (dead code elimination) that we support via --gc-sections (which is on by default). There are some limitations in the current DCE that the linker does and potenially some ways we could improve it. I believe this kind of problem can sometimes stem from the fact that link DCE is performed in two phases.

  1. Resolve all the symbols.
  2. Walk the symbol dependency tree marking what is live (https://github.com/llvm/llvm-project/blob/main/lld/wasm/MarkLive.cpp) then don't include anything that is not live in the output.

Note: (2) only happens if --gc-sections is enabled.

One limitation that we currently have is that (1) is performed first. This is when we pull any needed object files from archives. Once we decide we need and object file its not possible to fully undo this decision, if we find out later (in 2) that we didn't need it after all.

@sunfishcode has tried to work around this limitation in some cases: https://reviews.llvm.org/D85062

@ggoodman
Copy link
Author

ggoodman commented Mar 23, 2024

Thanks for taking a look. There's a good chance I'm using the wrong terms everywhere! I'm not familiar with this whole toolchain at all. I used lto only because that was the term in the Cargo.toml I was using to opt into this behaviour.

I think the reference (or at least one of them) that is preventing the fd_* host bindings from being stripped away is in __stdio_exit, called by __wasm_call_dtors.

  (func $__wasm_call_dtors (;630;) (type 72)
    call $#func629<dummy>
    call $__stdio_exit
  )
  (func $_initialize.command_export (;631;) (type 31) (result i32)
    call $_initialize
    call $__wasm_call_dtors
  )

I think I can use a tool like walrus to perform strategic mutations to cut out this cruft and then use its gc to get what I want.

Having spent some time looking the .wat generated for this project and others, I notice that the tables become opaque barriers to understanding the call graph. I'm wondering if the way things are encoded into the big (elem ...) table is making static analysis very hard. Are there some incantations that deprioritize this way of encoding functions? I don't think there should be dynamic dispatch type scenarios here but--again--I'm very new to this and have a super loose understanding of the moving parts. I'm hoping that with a less optimized binary, I could perform some surgery on it and then later run wasm-opt against something with better visibility into control flow and reachability.

@sbc100
Copy link
Member

sbc100 commented Mar 23, 2024

Thanks for taking a look. There's a good chance I'm using the wrong terms everywhere! I'm not familiar with this whole toolchain at all. I used lto only because that was the term in the Cargo.toml I was using to opt into this behaviour.

The --gc-sections behaviour is the default and should not require any opt in. If you are using the term lto in your build system I would guess (?) that is referring to the -flto compiler/link flag which I think is probably not related to this issue (unless I'm misunderstanding).

I would hope that the kind of DCE you are trying achieve here would work with or without LTO enabled.

I think the reference (or at least one of them) that is preventing the fd_* host bindings from being stripped away is in __stdio_exit, called by __wasm_call_dtors.

  (func $__wasm_call_dtors (;630;) (type 72)
    call $#func629<dummy>
    call $__stdio_exit
  )
  (func $_initialize.command_export (;631;) (type 31) (result i32)
    call $_initialize
    call $__wasm_call_dtors
  )

I think I can use a tool like walrus to perform strategic mutations to cut out this cruft and then use its gc to get what I want.

Having spent some time looking the .wat generated for this project and others, I notice that the tables become opaque barriers to understanding the call graph. I'm wondering if the way things are encoded into the big (elem ...) table is making static analysis very hard.

Indeed any time you have function pointer floating around (such as vtables, or in this case I believe musl using a kind of vtable type thing for std file handles) then DCE becomes hard since the linker cannot easily reason about the dependency graph. This is one reason why de-virtualization in C++ is useful when it works.

Are there some incantations that deprioritize this way of encoding functions? I don't think there should be dynamic dispatch type scenarios here but--again--I'm very new to this and have a super loose understanding of the moving parts. I'm hoping that with a less optimized binary, I could perform some surgery on it and then later run wasm-opt against something with better visibility into control flow and reachability.

I think one of the problem is that stdio in musl is using dyanmic dispatch. See https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/src/internal/stdio_impl.h#L28-L30

The dispatch tables then include pointer to the function you want to avoid including: https://github.com/bminor/musl/blob/7ada6dde6f9dc6a2836c3d92c2f762d35fd229e0/src/stdio/stdin.c#L11-L13

@sbc100
Copy link
Member

sbc100 commented Mar 23, 2024

For __stdio_exit it looks like there is a dummy version in exit.c and the real version should only be included if __stdio_exit_needed is referenced.

The problem is that I imagine what is happening is that __stdio_exit_needed is found to be needed during the first round of symbol resolution in the linker, is that right?

@ggoodman
Copy link
Author

@sbc100 as I noted, I'm a bit over my head here but that matches my intuition. Here's what I think is happening, again from intuition only:

I've got a dummy c library that has a strong reference to printf and when it is compiled, those strong references cause the *_needed to statefully toggle. Now my .a file for the library bakes in that state.

Even though the consuming rust project only consumes a symbol from the library that has no references, they are statefully baked in and obfuscated through that dynamic dispatch mechanism.

@ggoodman
Copy link
Author

ggoodman commented Apr 9, 2024

I wonder if there's some trick that might allow us to avoid the virtualization of these FILE methods in a way that still allows for dead-code elimination at linking time. My naive take on this would be that the FILE methods point to real functions that do the branching in their function bodies. The branching in the function body would need to be done in a way that compilers / linkers are able to statically analyze it out.

Does that sound plausible?

@sbc100
Copy link
Member

sbc100 commented Apr 9, 2024

If you can find a way to do what you describe that would be great. I am not aware of any way to achieve that though.

@infinitesnow
Copy link

infinitesnow commented May 12, 2024

I am not sure whether what I have found is relevant for you, as it is for C++ and not Rust, but I was struggling with something similar (undesired fd_* imports) and I managed to remove them. The fix is simple but it required me to rebuild the SDK.

I haven't dug into productising this solution for various configurations and it's possibly not an ideal default, but I'd argue it should be taken under consideration, as as of now there is no way to natively run stdlib containers in simple bare metal applications in the browser without relying on a WASI runtime or recompiling the whole SDK (which takes a while as you surely are aware of).

I have pushed a PR with more details here: #418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants