Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from Wasmer to Wasmtime #3349

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

bjorn3
Copy link

@bjorn3 bjorn3 commented May 14, 2024

Zellij is currently unable to update Wasmer as the currently used version is the last one without mandatory wasix support which Zellij doesn't want to adopt for various reasons. In the mean time a CVE has been found in the Cranelift version used by this Wasmer version. While it has been worked around in #2830, future CVE's may not be possible to workaround without updating Cranelift and thus Wasmer. This PR switches from Wasmer to Wasmtime to avoid all these problems.

Each individual commit builds. All commits except for the last two commits keep using Wasmer and thus could land without the actual switch to Wasmtime if preferred.

.appender("logFile")
.build("wasmer_compiler_cranelift", LevelFilter::Warn),
.appender("logPlugin")
.build("wasmtime_wasi", LevelFilter::Warn),
Copy link
Author

@bjorn3 bjorn3 May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this go to logFile or logPlugin? On the one hand these are logs in response to plugin actions, on the other hand they are not logs directly produced by the plugin itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter that much, they go into the same file in the end. But might indeed be nicer to change it to logPlugin since then we'll see the plugin id too.

@bjorn3
Copy link
Author

bjorn3 commented May 15, 2024

In Wasmer trying to create a file with a relative path creates it in the /host directory. In Wasmtime it will result in an error that there is no "pre-opened file descriptor" through which the file can be opened. As WASI doesn't have any such concept as a current working directory, the Wasmtime behavior is more natural. It does break the current released version of zjstatus however. dj95/zjstatus@25e3a56 works around this. I tried to work around the difference, but while it fixed this specific issue, it broke a lot more things. As such I would prefer documenting the difference rather than making Wasmtime behave the same way as Wasmer.

@bjorn3
Copy link
Author

bjorn3 commented May 15, 2024

CI should be fixed now.

@bjorn3
Copy link
Author

bjorn3 commented May 19, 2024

(Rebased. Will likely keep doing this as soon as conflicts arise to avoid accumulating conflicts.)

@syrusakbary
Copy link

Given that this PR can break many behaviors for plugins on Zellij, we can commit to update Cranelift in Wasmer very soon, so it doesn't become a problem for the project

@imsnif
Copy link
Member

imsnif commented May 20, 2024

Given that this PR can break many behaviors for plugins on Zellij, we can commit to update Cranelift in Wasmer very soon, so it doesn't become a problem for the project

Much appreciated @syrusakbary !! But since I don't want you to perform extra work, I will emphasize something I mentioned elsewhere: we will not consider any upgrade of wasmer that includes wasix or any sort of extra dependency that comes with wasix.

Given that I understood this is a big ask for your project (totally legit ofc, every project and application has different requirements), I gave @bjorn3 the go-ahead for this work. If all checks out with it (I haven't gone over it yet) and we manage to iron out the differences with plugins you rightfully mentioned (some of them have already been brought up as points of order by @bjorn3 ) - we will go through with this change.

@syrusakbary
Copy link

We will not consider any upgrade of wasmer that includes wasix or any sort of extra dependency that comes with wasix. Given that I understood this is a big ask for your project

Is not a big ask at all! Because WASIX is a superset of WASI, is actually feasible to only ship the base calls of WASI –for example– via a feature flag in Cargo. We can do that for you guys so you can upgrade Wasmer easily and without the burden of changing runtimes and plugins!

I've created this issue so we can follow up on that quickly: wasmerio/wasmer#4722

@imsnif
Copy link
Member

imsnif commented May 20, 2024

Thanks @syrusakbary ! As mentioned here and elsewhere, a feature flag will probably not suffice. We also need things like dependencies that are only used in wasix not to be included at all (for our packagers). This would likely have to be an entirely different crate (though I of course do not know about your internals).

But as I said - the work has already started here, so I intend to see it through. If we hit a road block that prevents us from migrating, or feel the migration is too risky - I'll be sure to check out the work in the linked issue to see if it would be a better path forward for us.

Thanks for your efforts and I am sorry things have to be this way, but we've sounded this alarm before a few times and we have to think of our application first.

@syrusakbary
Copy link

syrusakbary commented May 20, 2024

@imsnif of course, no worries! Do what you think is best for the project.

Just a minor clarification: on the Wasmer side, we can turn off the WASIX syscalls, dependencies and code paths incredibly easily via a feature flag, so a new crate shall not be needed if you would like to stick with Wasmer :)

Copy link
Member

@imsnif imsnif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bjorn3 - first, thanks so much for your thorough work on this. This is a major infrastructure change for us and I do not want to take it lightly - so I hope you don't mind being patient with my review cycle.

I'm going on a short vacation tomorrow and didn't want to leave this hanging without a response while I'm away. So I added some comments about things that immediately jumped out at me to get us started. I will give this a more thorough shake-up and review when I get back around the beginning of June.

In Wasmer trying to create a file with a relative path creates it in the /host directory. In Wasmtime it will result in an error that there is no "pre-opened file descriptor" through which the file can be opened. As WASI doesn't have any such concept as a current working directory, the Wasmtime behavior is more natural. It does break the current released version of zjstatus however. dj95/zjstatus@25e3a56 works around this. I tried to work around the difference, but while it fixed this specific issue, it broke a lot more things. As such I would prefer documenting the difference rather than making Wasmtime behave the same way as Wasmer.

I've been thinking about this specific issue and while I hear you about the difficulty with the behavior change, I'm not enthusiastic about it. zjstatus is just one example that we happened to catch, I'm sure there are more. Backwards compatibility of already-compiled plugins is a strong value for our ecosystem.

I'm not saying this is a deal-breaker, but I'd like to understand the trade-offs before going ahead with this. What other stuff breaks if we try to work around this problem? How big of a hack is it? Will it affect upgrades? What else?

@@ -84,7 +84,7 @@ fn start_zellij(channel: &mut ssh2::Channel) {
)
.unwrap();
channel.flush().unwrap();
std::thread::sleep(std::time::Duration::from_secs(1)); // wait until Zellij stops parsing startup ANSI codes from the terminal STDIN
std::thread::sleep(std::time::Duration::from_secs(3)); // wait until Zellij stops parsing startup ANSI codes from the terminal STDIN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why we need the increase here? I don't mind increasing it in in this case of the e2e tests if we need it, but I'd prefer to understand what happened.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, the first snapshot for several tests seem to be taken while still compiling wasm, which suggests that compiling is slower. This is surprising to me as locally it actually seemed to be faster. Maybe the singlepass compiler of Wasmtime is slower than the one of Wasmer and only the optimizing compiler is faster?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Winch is about 2-6 times slower compiling than Wasmer's Singlepass for non-trivial applications (>2Mb), you can check more examples such as time to compile Spidermonkey or ffmpeg on each runtime on this article:

https://wasmi-labs.github.io/blog/posts/wasmi-v0.32/

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All builtin plugins together add up to 2MB, but point taken. I agree Winch being a fair bit slower at compiling is likely the cause here. Caching compiled wasm modules between tests rather than recompiling every single time would likely help a lot. I haven't looked into how hard that would be, but Zellij already caches compiled wasm modules under normal operation. To be honest I'm surprised it isn't already cached between tests. Or maybe it is already getting cached, but would just need a warmup with a higher timeout before any tests run?

zellij-server/Cargo.toml Outdated Show resolved Hide resolved
@bjorn3
Copy link
Author

bjorn3 commented May 27, 2024

I've been thinking about this specific issue and while I hear you about the difficulty with the behavior change, I'm not enthusiastic about it. zjstatus is just one example that we happened to catch, I'm sure there are more. Backwards compatibility of already-compiled plugins is a strong value for our ecosystem.

I just reproduced this behavior in standalone Wasmer. Turns out it is completely disregarding the intention of the WASI specification:

fn main() {
    println!("{:?}", std::fs::write("foo", ""));
    println!("{:?}", std::fs::read("foo"));
    println!("{:?}", std::fs::read_dir("/").map(|dir| dir.collect::<Vec<_>>()));
    println!("{:?}", std::fs::read_dir(".").map(|dir| dir.collect::<Vec<_>>()));
    println!("{:?}", std::fs::read_dir("/tmp/foo/x").map(|dir| dir.collect::<Vec<_>>()));
}

when run in wasmtime with a /tmp/foo/x directory (containing my_file) passed through to the guest will print:

Err(Custom { kind: Uncategorized, error: "failed to find a pre-opened file descriptor through which \"foo\" could be opened" })
Err(Custom { kind: Uncategorized, error: "failed to find a pre-opened file descriptor through which \"foo\" could be opened" })
Err(Custom { kind: Uncategorized, error: "failed to find a pre-opened file descriptor through which \"/\" could be opened" })
Err(Custom { kind: Uncategorized, error: "failed to find a pre-opened file descriptor through which \".\" could be opened" })
Ok([Ok(DirEntry("/tmp/foo/x/my_file"))])

as I would have expected while wasmer prints

Ok(())
Ok([])
Ok([Ok(DirEntry("/.app")), Ok(DirEntry("/.private")), Ok(DirEntry("/bin")), Ok(DirEntry("/dev")), Ok(DirEntry("/etc")), Ok(DirEntry("/foo")), Ok(DirEntry("/tmp"))])
Ok([Ok(DirEntry("./.app")), Ok(DirEntry("./.private")), Ok(DirEntry("./bin")), Ok(DirEntry("./dev")), Ok(DirEntry("./etc")), Ok(DirEntry("./foo")), Ok(DirEntry("./tmp"))])
Ok([Ok(DirEntry("/tmp/foo/x/my_file"))])

In other words rather than passing through the directories the user specified as pre-opened directories as seems to be intended by the WASI specification, it creates an in-memory virtual filesystem whose contents are discarded upon exit and inside this VFS it mounts the user specified directories. The pre-opened directories it passes are / (fd 4), / (fd 5) and . (fd 6). )Yes, it passes / twice for whatever reason. I can imagine that some guest languages will get upset about that.) Wasmtime directly passes /tmp/foo/x as a pre-opened directory instead.

$ RUST_LOG=wasmer_wasix::syscalls::wasi::fd_prestat_dir_name=trace wasmer run target/wasm32-wasi/debug/foo.wasm
2024-05-27T17:00:01.380160Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: return=Errno::success fd=3 path="/"
2024-05-27T17:00:01.380179Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: close time.busy=31.2µs time.idle=8.81µs fd=3 path="/"
2024-05-27T17:00:01.380201Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: return=Errno::success fd=4 path="/"
2024-05-27T17:00:01.380205Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: close time.busy=4.83µs time.idle=448ns fd=4 path="/"
2024-05-27T17:00:01.380213Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: return=Errno::success fd=5 path="."
2024-05-27T17:00:01.380217Z TRACE ThreadId(01) fd_prestat_dir_name: wasmer_wasix::syscalls::wasi::fd_prestat_dir_name: close time.busy=4.09µs time.idle=326ns fd=5 path="."
[...]

It would probably be possible to emulate this with wasmtime-wasi, but I think it would only hide bugs inside plugins. From what I can tell before dj95/zjstatus@25e3a56 the .zjstatus.log file would have been discarded rather than saved on the host as was intended, however based on https://discord.com/channels/771367133715628073/1240239515071942727/1240246058970517556 it seems like it somehow gets stored in the path mounted at /host anyway. If you do still want me to implement support for emulating the Wasmer behavior, I will do so. Maybe it could be a future compatibility warning when it seems like the plugin depends on the Wasmer behavior.

@imsnif
Copy link
Member

imsnif commented Jun 7, 2024

Hey @bjorn3 - I took this for a spin and am seeing some weird behavior. Could it be that a plugin's state is being reset when a new instance is loaded for some reason?

To reproduce:

  1. Clone this project: https://github.com/zellij-org/rust-plugin-example
  2. Start Zellij with the layout in this project (zellij -l plugin-dev-workspace.kdl)
  3. Once the plugin has rendered and you've given it permissions, change the input-mode a few times (eg. move to Tab then to Pane) - see that the mode counts have been logged in the plugin's UI
  4. Open a new tab (an identical tab will open, because we've loaded the plugin-dev-workspace.kdl layout globally, so another instance of the plugin will be loaded in the new tab)
  5. Go back to the first tab and see that the counts have been reset (in zellij 0.40.1 they are not reset).

@bjorn3
Copy link
Author

bjorn3 commented Jun 7, 2024

I can't reproduce this behavior locally. For me the counts are preserved as I switch between tabs.

Peek.2024-06-07.17-36.webm

@imsnif
Copy link
Member

imsnif commented Jun 7, 2024

Also when the second plugin instance is initially launched? That's when it gets erased for me.

@bjorn3
Copy link
Author

bjorn3 commented Jun 7, 2024

Yes, opening a new tab preserves the state too:

Peek.2024-06-07.17-36.webm

@imsnif
Copy link
Member

imsnif commented Jun 7, 2024

Alright, I'll do some troubleshooting next week and find out what's up.

@imsnif
Copy link
Member

imsnif commented Jun 14, 2024

Hey, so first - I did some testing and this was a configuration error on my part. My apologies!

Otherwise, I'm still testing this PR. I'm sorry for not getting to do a full review this week, it was on my schedule and got pushed off. I fully plan to do so next week. Thank you for your patience @bjorn3 !

This will enable PluginEnv to be the Store context when migrating to
Wasmtime.
This will allow removing the Clone impl from PluginEnv when migrating to
Wasmtime as required by the missing Clone impl on Wasmtime's WasiCtx.
Wasmtime requires storing the read/write end of the pipe outside of the
WasiCtx. Passing PluginEnv to these functions allows storing them in the
PluginEnv.
Copy link
Member

@imsnif imsnif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bjorn3 - I've given this a thorough shakedown and gone over the code and all in all this looks great.

Going over the code, I looked for areas that can be potentially problematic, are not explicitly covered by the e2e tests and might have been accidentally broken by someone who is not intimately familiar with the application. All of them work as expected, but I'd like to list what I've done here for the record:

Manually tested (all pass)

  • Rendering / logging from within a plugin (both STDOUT and STDERR work as expected)
  • Plugin pipes
    • rendering/not rendering after a pipe
    • blocking/not blocking CLI pipe with/without render
    • multiple simultaneous concurrent CLI pipes
    • piping data between plugins
  • Filesystem locations bound to plugins behave as they should
    • /host (CWD of last focused pane)
    • /data (specific cache folder, shared between plugin instances, deleted on plugin exit)
    • /tmp (temporary folder)

Regarding the CWD question

Your tests have convinced me that reimplementing this behavior is likely more trouble than it's worth - especially in the long run. Only thing is: is it possible to make the error clearer? Instead of failed to find a pre-opened file descriptor through which \"foo\" could be opened, could we make it Zellij Plugins can't use relative paths, please use an absolute path. For more info, see: https://zellij.dev/documentation/plugin-api-file-system (bonus points if it can also include a suggestion based on the user's supplied relative path: Zellij Plugins can't use relative paths, please use an absolute path, for example /host/<relative-path>. For more info, see: https://zellij.dev/documentation/plugin-api-file-system.)

Final thoughts

Otherwise, I've asked our Go SDK author to do a quick smoke test of this just to be 100% certain there isn't any implicit behavior the Go SDK is relying on that we're overriding here.

One last thing is the dependency count. This seems to add ~50 packages to Zellij. I don't know if this is possible, but I do want to ask: is it an option to somehow weed down the transient dependencies here? Not a deal breaker, but would certainly be nice.

All in all, this looks great and once we sort out these last few things I think we can merge this. Fantastic work @bjorn3 !! Thank you.

.appender("logFile")
.build("wasmer_compiler_cranelift", LevelFilter::Warn),
.appender("logPlugin")
.build("wasmtime_wasi", LevelFilter::Warn),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't matter that much, they go into the same file in the end. But might indeed be nicer to change it to logPlugin since then we'll see the plugin id too.

@bjorn3
Copy link
Author

bjorn3 commented Jun 20, 2024

Only thing is: is it possible to make the error clearer? Instead of failed to find a pre-opened file descriptor through which "foo" could be opened, could we make it Zellij Plugins can't use relative paths, please use an absolute path. For more info, see: https://zellij.dev/documentation/plugin-api-file-system (bonus points if it can also include a suggestion based on the user's supplied relative path: Zellij Plugins can't use relative paths, please use an absolute path, for example /host/. For more info, see: https://zellij.dev/documentation/plugin-api-file-system.)

I don't think that is possible unfortunately. This error is produced by the copy of the rust standard library linked into the plugin itself (https://github.com/rust-lang/rust/blob/1aaab8b9f8dc488cadc4f083b3a11fb11b45cb77/library/std/src/sys/pal/wasi/fs.rs#L722-L726), not by wasmtime-wasi.

One last thing is the dependency count. This seems to add ~50 packages to Zellij. I don't know if this is possible, but I do want to ask: is it an option to somehow weed down the transient dependencies here? Not a deal breaker, but would certainly be nice.

I can disable profiling support to cut the dependency count down a bit. It needs manual configuration to enable at runtime anyway, which this PR doesn't do. Other than that I think most new dependencies are coming from tokio, which wasmtime-wasi has a hard dependency on.

@imsnif
Copy link
Member

imsnif commented Jun 20, 2024

I don't think that is possible unfortunately. This error is produced by the copy of the rust standard library linked into the plugin itself (https://github.com/rust-lang/rust/blob/1aaab8b9f8dc488cadc4f083b3a11fb11b45cb77/library/std/src/sys/pal/wasi/fs.rs#L722-L726), not by wasmtime-wasi.

Too bad. Will document this for the next release then.

I can disable profiling support to cut the dependency count down a bit. It needs manual configuration to enable at runtime anyway, which this PR doesn't do. Other than that I think most new dependencies are coming from tokio, which wasmtime-wasi has a hard dependency on.

Alright, let's do that.

@bjorn3
Copy link
Author

bjorn3 commented Jun 20, 2024

Alright, let's do that.

Done and disabled a couple more features too.

By the way since I opened this PR, Wasmtime 21.0 has been released. Do you want me to update to it before or after merging this PR?

@imsnif
Copy link
Member

imsnif commented Jun 20, 2024

By the way since I opened this PR, Wasmtime 21.0 has been released. Do you want me to update to it before or after merging this PR?

Let's do it before if we're already here. Do you think there are significant changes that will affect us?

@bjorn3
Copy link
Author

bjorn3 commented Jun 20, 2024

Updated. No api changes that affects us. Only had to ensure the new std feature of wasmtime is enabled.

@bjorn3
Copy link
Author

bjorn3 commented Jun 20, 2024

Done and disabled a couple more features too.

By the way, I forgot to mention that I disabled the "pooling allocator" too. It is a performance optimization for when you quickly drop and recreate instances of a wasm module. For example creating an instance for each request in an http server. Are plugin instances meant to be quickly destroyed again after invoking a command inside them or do they live for a while in Zellij?

@imsnif
Copy link
Member

imsnif commented Jun 20, 2024

By the way, I forgot to mention that I disabled the "pooling allocator" too. It is a performance optimization for when you quickly drop and recreate instances of a wasm module. For example creating an instance for each request in an http server. Are plugin instances meant to be quickly destroyed again after invoking a command inside them or do they live for a while in Zellij?

They definitely live for a while. There can be cases where a plugin is spun up to deal with a request (eg. through a CLI pipe) and then kills itself, but I don't think there's a use-case for it then spinning itself up again multiple times.

@imsnif
Copy link
Member

imsnif commented Jun 20, 2024

So where things are now: I'm going to give stacab a few days to see if he can test this with some Go plugins, and if all is well or he doesn't have time, I'll give it one more shakeup with the new version and merge if all is well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants