Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Use VirGL or something similar to utilize host mesa drivers #5793

Open
2 tasks done
RossComputerGuy opened this issue Apr 29, 2024 · 15 comments
Open
2 tasks done

Comments

@RossComputerGuy
Copy link

Checklist

  • I agree to follow the Code of Conduct that this project adheres to.
  • I have searched the issue tracker for a feature request that matches the one I want to file, without success.

Suggestion

With this feature, the mesa build in Flatpak runtimes can be trimmed down. This would also allow using the right driver on systems which requires a specific mesa for or build (example: Asahi Linux) without updating or changing the runtimes. Using VirGL or something similar, Flatpak would pass GPU commands and other things throu the sandbox and to the host system. This may be difficult to implement but worth the benefits.

@alyssarosenzweig
Copy link

virgl can be slow

@chergert
Copy link
Contributor

I'm just going to leave a technical comment about implementation, not whether or not this is something we'd want in Flatpak.

I talked @airlied about this a few months back primarily from the standpoint of how many GL bugs I tend to get upstream in GTK-based GNOME applications (GTK, Builder, Ptyxis, Text Editor, VTE, etc) with Flatpak. They almost always tend to be in the form of version skew or simply the GL extensions not having all the patches the distro has (Asahi for example).

What I was told (assuming I understood it correctly) is that we would need a user-space component to do some of the shuffling back-and-forth that Qemu is doing now. There are some tests in the Mesa repository which do this, which might serve as an example in the form of a virpipe mode over a unix socket.

If using DMA-BUF we may have additional work to do here but that could potentially work well with the DMA-BUF work happening in Mutter/GTK/etc.

@RossComputerGuy
Copy link
Author

@chergert That certainly is interesting. This idea of mine came from when I was at SCALE/NixCon NA. Being able to pass the mesa driver through the host and to each Flatpak process would likely fix a lot of issues with unique systems. I certainly have had unique problems with things like Discord and OBS. Discord doesn't have the fonts rendering right because it uses software rendering. OBS crashes because of some sort of segfault or issue with the mesa inside of the Flatpak runtime.

There are some tests in the Mesa repository which do this, which might serve as an example in the form of a virpipe mode over a unix socket.

This sounds like a good solution to be able to load the driver on the host side. Would we just have a sandboxed driver which we force load in or do we only compile mesa with that driver? If it could be possible to override the socket to the driver so you could use a Flatpak shipped Mesa then that would be awesome for a use case I have.

@chergert
Copy link
Contributor

No matter what you choose there is non-trvial code to write. And even if you implement it is no guarantee it will work everywhere or with every host/guest driver configuration. I also can't speak to anything regarding the security of such a system or if there are any benefits from that standpoint (which tends to be the reason most people want it "sandboxed").

It's just the path of least resistance IMO.

@swick
Copy link
Contributor

swick commented May 22, 2024

There are at two more approaches to making it possible to use the host mesa drivers in flatpak

  1. r-o map host /usr into flatpak and use libcapsule to load mesa libs into their own linker namespace (steam solution)
  2. create mesa drivers with only glibc as a dependency by statically linking everything and use them on the host and in flatpak; add a glibc flatpak extension to make sure old runtimes keep working with the new driver

The first solution is kind of fragile, the second one seems more robust but requires distributions to ship a statically compiled mesa driver which can be huge (on the other hand, reusing it for flatpak means no additional storage cost).

@alyssarosenzweig
Copy link

create mesa drivers with only glibc as a dependency by statically linking everything and use them on the host and in flatpak

This seems doable except for the LLVM runtime dependency that a few Mesa drivers have. Notably, llvmpipe and - for now - radeonsi. The latter is growing away from LLVM but for now that's a biggy :/

@RossComputerGuy
Copy link
Author

  1. create mesa drivers with only glibc as a dependency by statically linking everything and use them on the host and in flatpak; add a glibc flatpak extension to make sure old runtimes keep working with the new driver

I don't see how that could help. I think the proxying over a socket approach is the least likely to be difficult. It could also possibly make it simpler to just offload things like games to a dedicated GPU instead of the internal GPU.

This seems doable except for the LLVM runtime dependency that a few Mesa drivers have. Notably, llvmpipe and - for now - radeonsi. The latter is growing away from LLVM but for now that's a biggy :/

Yeah, LLVM is very hard to statically link (I know that all too well).

@smcv
Copy link
Collaborator

smcv commented May 24, 2024

r-o map host /usr into flatpak and use libcapsule to load mesa libs into their own linker namespace (steam solution)

In Steam's "pressure-vessel" container tool, we don't actually use libcapsule to load Mesa into its own linker namespace. We wanted to do that (and that's why libcapsule was written), but it doesn't actually work: if I remember correctly, the first game we tried with libcapsule coincidentally worked, but the second one deadlocked. I don't fully understand the details, but I think it's something like each namespace getting its own instance of libc.so.6, with some internal state ending up shared between them (heap allocation and mutexes, I think?) and each instance assuming that the other one doesn't exist, leading to crashes and deadlocks when they both try to access shared resources.

Part of the problem here is that in both Flatpak and Steam, we need to support both ways round: a modern app/game with a 2027 Flatpak runtime (or Steam Runtime) on a 2021 operating system (perhaps something like Debian LTS or RHEL), or a legacy app/game with a 2021 runtime on a 2027 operating system (perhaps a rolling release like Arch).

What pressure-vessel now does is to go through the recursive dependencies of Mesa, and for each library, ask: which is newer, host or container? and if the answer is the host, remove the container's library and load the host one instead. This can only work because we operate on a temporary copy of the runtime from which we can delete unwanted files (unlike Flatpak, which uses a read-only tree), and it's hideously complicated, especially on host OSs that break our assumptions: the Debian/Ubuntu family (multiarch) are fine, as are the Red Hat/SUSE family (FHS) and the Arch family (FHS variant), but for example ClearLinux, Exherbo and NixOS all broke our assumptions in various ways and needed OS-specific code.

We can mostly make this work in pressure-vessel because at least we have the simplifying assumption that it isn't a security boundary, but Flatpak does want to be a security boundary, which means it's necessarily spending its "complexity tokens" in different places.

The upstream plan for how to make libcapsule achievable is to mark libc.so.6 and a few other key libraries as "unshareable", so that instead of having one copy in each dlopen namespace, we have one instance in memory and it exists in both dlopen namespaces simultaneously - and then we would still have to use the same approach as pressure-vessel to identify which libc.so.6 is newer, and exclusively use that one, rejecting the other.

Unfortunately, some of the components that come with glibc, notably ld.so, are tightly-coupled to libc.so.6 but exist at paths that need to be hard-coded as part of the platform ABI - and some distributions even patch their glibc to have different paths for core things like ld.so.cache. This means additional complexity every time we find that we need to use the host rather than runtime glibc. Steam's pressure-vessel manages to work around this, but in some cases only by taking advantage of the fact that the same people control pressure-vessel and the Steam Runtime, and we can make changes on either side if we need to. Flatpak and its decentralized runtimes don't really have that luxury.

add a glibc flatpak extension

This sounds a lot simpler than it actually is! Responsibility for loading dynamic libraries is divided between ld.so and the former libdl.so.2 (now part of libc.so.6 if using a modern glibc version), and as a result libdl.so.2, libc.so.6 and ld.so use each other's internal data structures, so we see crashes if they're mismatched.

@swick
Copy link
Contributor

swick commented May 24, 2024

What pressure-vessel now does is to go through the recursive dependencies of Mesa, and for each library, ask: which is newer, host or container? and if the answer is the host, remove the container's library and load the host one instead

Thanks for explaining.

This sounds a lot simpler than it actually is! Responsibility for loading dynamic libraries is divided between ld.so and the former libdl.so.2 (now part of libc.so.6 if using a modern glibc version), and as a result libdl.so.2, libc.so.6 and ld.so use each other's internal data structures, so we see crashes if they're mismatched.

I'm aware that those libraries all need to match, but that doesn't make a glibc flatpak extension any harder, does it? It would include all the glibc libraries instead of just libc.so. Am I missing something here?

This seems doable except for the LLVM runtime dependency that a few Mesa drivers have. Notably, llvmpipe and - for now - radeonsi. The latter is growing away from LLVM but for now that's a biggy :/

Is it impossible to link LLVM statically or is this just not possible in the build system because no sane person would want the size of LLVM in their mesa build?

@RossComputerGuy
Copy link
Author

Is it impossible to link LLVM statically or is this just not possible in the build system because no sane person would want the size of LLVM in their mesa build?

In my experience, I could never link LLVM on any of my systems. Even with 48GB of RAM and 16GB of swap, I would crash with a OOM.

@smcv
Copy link
Collaborator

smcv commented May 24, 2024

I'm aware that those libraries all need to match, but that doesn't make a glibc flatpak extension any harder, does it? It would include all the glibc libraries instead of just libc.so. Am I missing something here?

I mentioned ld.so. Specifically, the absolute path /lib64/ld-linux-x86-64.so.2 (or its equivalent for non-x86_64) is hard-coded into every ELF executable and library as the ELF interpreter that will be used by the kernel to load the executable and libraries. The ld.so executable that is the target of that symlink needs to be in lockstep with the libc.so.6 that will be used by the executable, and by every library in its process space. This means that an extension containing a newer/backported glibc would need to be able to overwrite that symlink in the filesystem "owned" by the top-level runtime, which Flatpak extensions currently have no way to do.

Worse, if some executables in the sandbox use LD_LIBRARY_PATH to find the newer glibc, while others overwrite the LD_LIBRARY_PATH because they think they know better and therefore accidentally end up with the older glibc, there is no way that they can both see the /lib64/ld-linux-x86-64.so.2 that matches their glibc - at least one of them must get the wrong ld.so and crash. I'm sure you're about to say "but surely nobody would do that", but I regret to inform you that, for example, multiple Steam games do overwrite the LD_LIBRARY_PATH, instead of prepending or appending like they probably should.

Whatever route is chosen for making use of host GL drivers, this is likely going to be "one does not simply walk into Mordor" territory, and we should assume that it'll need to be someone's full-time job for a significant period, plus an ongoing maintenance time-sink.

@swick
Copy link
Contributor

swick commented May 24, 2024

Right, the libc extension would need special handling and all the shared objects would have to be mounted over the existing ones but that doesn't sound too hard to implement and it already done to some degree for /etc. I'll think about this a bit more in the coming days but nothing you mentioned makes me think this is a horrible idea.

@airlied
Copy link

airlied commented May 24, 2024

Just on linking LLVM statically, it's possible and I think amd ship their drivers like that when you download from them. Now the question is what should a distro optimise for. Like we could build mesa for the distro and have an optional -flatpak rebuild in a separate location that is the same drivers just statically linked down to glibc? (llvm, libstdc++).

@RossComputerGuy
Copy link
Author

Just on linking LLVM statically, it's possible

Anything is possible but it's just a huge pain to link because the libraries become so big that it OOM's any of my systems when I've tried.

@airlied
Copy link

airlied commented May 24, 2024

OOM is often just a property of configuring the builds properly. For meson I think -Dbackend_max_links=1 can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants