-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add an all-syscalls feature which disables seccomp filtering #5224
base: main
Are you sure you want to change the base?
Conversation
I'm curious what is the security impact of this permission? |
Well, it does exactly what it says on the tin: it allows all system calls, with no seccomp filter. This is a functionality vs. risk trade-off, with several security impacts, depending on the syscall. For syscalls that allow manipulating the mount table (mount, umount, etc.) or the view of the filesystem (pivot_root, chroot) or entering a new user namespace (clone, clone3, etc.), the worst case scenario is that the app can remap the filesystem to trick trusted components like xdg-desktop-portal into thinking it is a non-sandboxed app on the host system, and that would be a sandbox escape. That's why this PR depends on #5084: my hope is that the app cannot manipulate the filesystem because bubblewrap has dropped For syscalls that manipulate non-namespaced kernel objects, notably the kernel keyring, we don't yet have a solution: this permission will allow manipulating those things as though the app was running non-sandboxed. I think we will need an answer to that, analogous to #5084, before this PR can be merged. Perhaps it would be enough to have a child process do the equivalent of For the parts of the seccomp filter that are optional (like As far as I understand it, the rest of the seccomp filter is about reducing kernel attack surface, by denying apps' ability to do things that in principle are safe (because the kernel denies unsafe uses of them), but in practice are particularly scary or have had kernel security vulnerabilities in the past. If those syscalls turn out to be unsafe, then that would be considered to be a vuln (CVE) in the kernel, which should be fixed in the kernel; but Flatpak normally mitigates such vulnerabilities, and |
* Improve error message if seccomp is disabled in kernel config * Add --disable-userns option (needed for flatpak#5084) * Add --assert-userns-disabled option (needed for flatpak#5084) Signed-off-by: Simon McVittie <[email protected]>
This lets us use its new features unconditionally. Signed-off-by: Simon McVittie <[email protected]>
This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <[email protected]> Signed-off-by: Simon McVittie <[email protected]>
Our seccomp filtering necessarily adds overhead to each system call, which is undesirable for syscall-heavy workloads like graphically intensive games. This is currently incomplete. It depends on flatpak#5084, but also needs solutions to: - preventing ioctl TIOCSTI (CVE-2017-5226): at the moment this is done in a relatively crude way via bwrap --new-session - preventing access to the kernel keyring (see also flatpak#4281): at the moment this is unsolved Resolves: flatpak#4187 Signed-off-by: Simon McVittie <[email protected]>
A simple |
Can you propose an implementation? (I suspect that if you try, you'll find that a nice UX is impossible, because the only way the desired functionality can be implemented is to make the decision during app startup; but I'd be happy to be proved wrong on this.) By the time it reaches a mergeable state, this feature is not actually as dangerous as you might think. We already have some permissions that are a complete sandbox escape (arbitrary code execution on the host system), used by applications that genuinely need that, like Flatseal and GNOME Builder. Compared with those, this one is actually relatively minor. This feature is not intended to be a complete sandbox escape, because it depends on #5084, which is intended to remove the "impersonate another Flatpak or a non-Flatpak" exploit route without needing syscall filtering. Preventing access to the kernel keyring is currently unsolved, and will need solving before this can be merged. Contributions welcome from people who understand the kernel keyring better than I do (which is almost anyone, I've never used that API). The rest of what this does is removing mitigations (hardening) against things that would already be treated as a kernel vulnerability, which is obviously not wonderful from a security point of view - but if people consider our current sandbox to have an unacceptable impact on game performance, then the choice is between "run Flatpak Steam in a sandbox with imperfect hardening" (via this feature) and "run Steam from the .deb with no sandboxing at all", from which I'd certainly prefer the first one. |
I wonder if seccomp performance can be improved some way. |
@smcv on the flatpak side of things, some major warnings when installing an app using this permission would be nice. And during runtime, if an app has this permission, the CLI would state it when starting up. Flathub would apply the same limits to this as it does access to flatpak-spawn. I was under the impression this just removed all syscall-related mitigations. Though I agree with @Erick555 and think seccomp could be improved a bit too. |
I see little point in this. Flatpak doesn't warn about much more dangerous permissions, like home directory or unrestricted dbus access. If anything, it probably should be marked as unsafe in store frontends, but this is out of scope of this PR. |
Implementations gratefully received! I suspect it isn't possible to make syscall-heavy workloads much faster than the current status, because the major time cost is the kernel intercepting syscalls to check them against our filter - but I'd be delighted to be proved wrong by a tested pull request. |
The seccomp performance in kernel may differ regarding to how filters are created/stacked. The performance may also depend on libseccomp version. The solution may even lie outside of flatpak. I think all current flatpak options are used to fix something that doesn't work without them. Introducing option that just promise to make something faster (from 0 to double digit %) may be very vague in terms of who and when should use it. Most users won't run any benchmarks and placebo usually sells strong. |
Depends on #5084, must not be merged as-is.
Even after #5084 lands, this is WIP, as noted in the commit message.
WIP: Add an all-syscalls feature which disables seccomp filtering
Our seccomp filtering necessarily adds overhead to each system call,
which is undesirable for syscall-heavy workloads like graphically
intensive games.
This is currently incomplete. It depends on Use new --disable-userns bubblewrap feature when possible #5084, but also needs
solutions to:
preventing ioctl TIOCSTI (CVE-2017-5226): at the moment this is done
in a relatively crude way via bwrap --new-session
preventing access to the kernel keyring (see also Allow sandboxed applications to access the kernel keyrings #4281): at the moment
this is unsolved
Resolves: Performance impact of seccomp filter in games #4187