WIP: Add an all-syscalls feature which disables seccomp filtering #5224

smcv · 2022-12-16T19:08:50Z

Depends on #5084, must not be merged as-is.

Even after #5084 lands, this is WIP, as noted in the commit message.

WIP: Add an all-syscalls feature which disables seccomp filtering

Our seccomp filtering necessarily adds overhead to each system call,
which is undesirable for syscall-heavy workloads like graphically
intensive games.

This is currently incomplete. It depends on Use new --disable-userns bubblewrap feature when possible #5084, but also needs
solutions to:
- preventing ioctl TIOCSTI (CVE-2017-5226): at the moment this is done
  in a relatively crude way via bwrap --new-session
- preventing access to the kernel keyring (see also Allow sandboxed applications to access the kernel keyrings #4281): at the moment
  this is unsolved
Resolves: Performance impact of seccomp filter in games #4187

TingPing · 2022-12-18T01:38:17Z

I'm curious what is the security impact of this permission?

smcv · 2022-12-19T19:40:18Z

I'm curious what is the security impact of this permission?

Well, it does exactly what it says on the tin: it allows all system calls, with no seccomp filter. This is a functionality vs. risk trade-off, with several security impacts, depending on the syscall.

For syscalls that allow manipulating the mount table (mount, umount, etc.) or the view of the filesystem (pivot_root, chroot) or entering a new user namespace (clone, clone3, etc.), the worst case scenario is that the app can remap the filesystem to trick trusted components like xdg-desktop-portal into thinking it is a non-sandboxed app on the host system, and that would be a sandbox escape. That's why this PR depends on #5084: my hope is that the app cannot manipulate the filesystem because bubblewrap has dropped CAP_SYS_ADMIN, and #5084 directly prevents entering a new user namespace and therefore regaining CAP_SYS_ADMIN. Obviously someone who is not me should review this carefully before accepting this PR.

For syscalls that manipulate non-namespaced kernel objects, notably the kernel keyring, we don't yet have a solution: this permission will allow manipulating those things as though the app was running non-sandboxed. I think we will need an answer to that, analogous to #5084, before this PR can be merged. Perhaps it would be enough to have a child process do the equivalent of keyctl new_session to revoke our access to the current session keyring?

For the parts of the seccomp filter that are optional (like --allow=bluetooth), I specifically documented this permission as bypassing those finer-grained permissions. I don't think there's any way around that: if we drop the seccomp filter for performance reasons, there's probably no other way to apply equivalent restrictions.

As far as I understand it, the rest of the seccomp filter is about reducing kernel attack surface, by denying apps' ability to do things that in principle are safe (because the kernel denies unsafe uses of them), but in practice are particularly scary or have had kernel security vulnerabilities in the past. If those syscalls turn out to be unsafe, then that would be considered to be a vuln (CVE) in the kernel, which should be fixed in the kernel; but Flatpak normally mitigates such vulnerabilities, and --allow=all-syscalls would be dropping that mitigation, increasing risk.

* Improve error message if seccomp is disabled in kernel config * Add --disable-userns option (needed for flatpak#5084) * Add --assert-userns-disabled option (needed for flatpak#5084) Signed-off-by: Simon McVittie <[email protected]>

This lets us use its new features unconditionally. Signed-off-by: Simon McVittie <[email protected]>

This feature (added in containers/bubblewrap#488) allows us to improve the guarantees of disallowing the sandbox to use recursive user namespaces (which is a security risk) compared to the existing limits that use seccomp. [smcv: Move this to flatpak_run_setup_base_argv() so it will apply equally in apply_extra_data() and `flatpak build`; make the compile-time check for a setuid bwrap into a runtime check] Co-authored-by: Simon McVittie <[email protected]> Signed-off-by: Simon McVittie <[email protected]>

Our seccomp filtering necessarily adds overhead to each system call, which is undesirable for syscall-heavy workloads like graphically intensive games. This is currently incomplete. It depends on flatpak#5084, but also needs solutions to: - preventing ioctl TIOCSTI (CVE-2017-5226): at the moment this is done in a relatively crude way via bwrap --new-session - preventing access to the kernel keyring (see also flatpak#4281): at the moment this is unsolved Resolves: flatpak#4187 Signed-off-by: Simon McVittie <[email protected]>

orowith2os · 2023-04-02T10:08:52Z

A simple --allow option seems like too little for a feature this large, especially with how dangerous it is. Wouldn't it be better to need some confirmation, and a warning, possibly stated everywhere it's used?

smcv · 2023-04-03T17:20:48Z

Wouldn't it be better to need some confirmation, and a warning, possibly stated everywhere it's used?

Can you propose an implementation? (I suspect that if you try, you'll find that a nice UX is impossible, because the only way the desired functionality can be implemented is to make the decision during app startup; but I'd be happy to be proved wrong on this.)

By the time it reaches a mergeable state, this feature is not actually as dangerous as you might think. We already have some permissions that are a complete sandbox escape (arbitrary code execution on the host system), used by applications that genuinely need that, like Flatseal and GNOME Builder. Compared with those, this one is actually relatively minor.

This feature is not intended to be a complete sandbox escape, because it depends on #5084, which is intended to remove the "impersonate another Flatpak or a non-Flatpak" exploit route without needing syscall filtering.

Preventing access to the kernel keyring is currently unsolved, and will need solving before this can be merged. Contributions welcome from people who understand the kernel keyring better than I do (which is almost anyone, I've never used that API).

The rest of what this does is removing mitigations (hardening) against things that would already be treated as a kernel vulnerability, which is obviously not wonderful from a security point of view - but if people consider our current sandbox to have an unacceptable impact on game performance, then the choice is between "run Flatpak Steam in a sandbox with imperfect hardening" (via this feature) and "run Steam from the .deb with no sandboxing at all", from which I'd certainly prefer the first one.

Erick555 · 2023-04-03T17:33:49Z

I wonder if seccomp performance can be improved some way.

orowith2os · 2023-04-03T20:24:28Z

@smcv on the flatpak side of things, some major warnings when installing an app using this permission would be nice. And during runtime, if an app has this permission, the CLI would state it when starting up.

Flathub would apply the same limits to this as it does access to flatpak-spawn.

I was under the impression this just removed all syscall-related mitigations.

Though I agree with @Erick555 and think seccomp could be improved a bit too.

gasinvein · 2023-04-03T21:16:49Z

some major warnings when installing an app using this permission would be nice

I see little point in this. Flatpak doesn't warn about much more dangerous permissions, like home directory or unrestricted dbus access. If anything, it probably should be marked as unsafe in store frontends, but this is out of scope of this PR.

smcv · 2023-04-04T13:42:47Z

I wonder if seccomp performance can be improved some way.

Though I agree with @Erick555 and think seccomp could be improved a bit too.

Implementations gratefully received! I suspect it isn't possible to make syscall-heavy workloads much faster than the current status, because the major time cost is the kernel intercepting syscalls to check them against our filter - but I'd be delighted to be proved wrong by a tested pull request.

Erick555 · 2023-04-04T14:22:55Z

The seccomp performance in kernel may differ regarding to how filters are created/stacked. The performance may also depend on libseccomp version. The solution may even lie outside of flatpak.

I think all current flatpak options are used to fix something that doesn't work without them. Introducing option that just promise to make something faster (from 0 to double digit %) may be very vague in terms of who and when should use it. Most users won't run any benchmarks and placebo usually sells strong.

smcv added enhancement WIP sandbox issue related to sandbox setup needs work labels Dec 16, 2022

This was referenced Dec 23, 2022

Install Steam as a Flatpak LukeShortCloud/winesapOS#328

Closed

Sandboxing portal flatpak/xdg-desktop-portal#924

Closed

smcv and others added 4 commits February 27, 2023 14:11

Update submodule: bubblewrap 0.8.0

be544d5

* Improve error message if seccomp is disabled in kernel config * Add --disable-userns option (needed for flatpak#5084) * Add --assert-userns-disabled option (needed for flatpak#5084) Signed-off-by: Simon McVittie <[email protected]>

build: Require bubblewrap 0.8.0

3277399

This lets us use its new features unconditionally. Signed-off-by: Simon McVittie <[email protected]>

smcv force-pushed the all-syscalls branch from 6c6e6b1 to 6685f77 Compare February 27, 2023 14:20

smcv mentioned this pull request Jun 1, 2023

Handle syscalls via allowlist instead of denylist #4462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add an all-syscalls feature which disables seccomp filtering #5224

WIP: Add an all-syscalls feature which disables seccomp filtering #5224

smcv commented Dec 16, 2022

TingPing commented Dec 18, 2022

smcv commented Dec 19, 2022

orowith2os commented Apr 2, 2023

smcv commented Apr 3, 2023

Erick555 commented Apr 3, 2023

orowith2os commented Apr 3, 2023

gasinvein commented Apr 3, 2023

smcv commented Apr 4, 2023

Erick555 commented Apr 4, 2023

WIP: Add an all-syscalls feature which disables seccomp filtering #5224

Are you sure you want to change the base?

WIP: Add an all-syscalls feature which disables seccomp filtering #5224

Conversation

smcv commented Dec 16, 2022

TingPing commented Dec 18, 2022

smcv commented Dec 19, 2022

orowith2os commented Apr 2, 2023

smcv commented Apr 3, 2023

Erick555 commented Apr 3, 2023

orowith2os commented Apr 3, 2023

gasinvein commented Apr 3, 2023

smcv commented Apr 4, 2023

Erick555 commented Apr 4, 2023