Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add an all-syscalls feature which disables seccomp filtering #5224

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

smcv
Copy link
Collaborator

@smcv smcv commented Dec 16, 2022

Depends on #5084, must not be merged as-is.

Even after #5084 lands, this is WIP, as noted in the commit message.


@smcv smcv added enhancement WIP sandbox issue related to sandbox setup needs work labels Dec 16, 2022
@TingPing
Copy link
Member

I'm curious what is the security impact of this permission?

@smcv
Copy link
Collaborator Author

smcv commented Dec 19, 2022

I'm curious what is the security impact of this permission?

Well, it does exactly what it says on the tin: it allows all system calls, with no seccomp filter. This is a functionality vs. risk trade-off, with several security impacts, depending on the syscall.

For syscalls that allow manipulating the mount table (mount, umount, etc.) or the view of the filesystem (pivot_root, chroot) or entering a new user namespace (clone, clone3, etc.), the worst case scenario is that the app can remap the filesystem to trick trusted components like xdg-desktop-portal into thinking it is a non-sandboxed app on the host system, and that would be a sandbox escape. That's why this PR depends on #5084: my hope is that the app cannot manipulate the filesystem because bubblewrap has dropped CAP_SYS_ADMIN, and #5084 directly prevents entering a new user namespace and therefore regaining CAP_SYS_ADMIN. Obviously someone who is not me should review this carefully before accepting this PR.

For syscalls that manipulate non-namespaced kernel objects, notably the kernel keyring, we don't yet have a solution: this permission will allow manipulating those things as though the app was running non-sandboxed. I think we will need an answer to that, analogous to #5084, before this PR can be merged. Perhaps it would be enough to have a child process do the equivalent of keyctl new_session to revoke our access to the current session keyring?

For the parts of the seccomp filter that are optional (like --allow=bluetooth), I specifically documented this permission as bypassing those finer-grained permissions. I don't think there's any way around that: if we drop the seccomp filter for performance reasons, there's probably no other way to apply equivalent restrictions.

As far as I understand it, the rest of the seccomp filter is about reducing kernel attack surface, by denying apps' ability to do things that in principle are safe (because the kernel denies unsafe uses of them), but in practice are particularly scary or have had kernel security vulnerabilities in the past. If those syscalls turn out to be unsafe, then that would be considered to be a vuln (CVE) in the kernel, which should be fixed in the kernel; but Flatpak normally mitigates such vulnerabilities, and --allow=all-syscalls would be dropping that mitigation, increasing risk.

smcv and others added 4 commits February 27, 2023 14:11
* Improve error message if seccomp is disabled in kernel config
* Add --disable-userns option (needed for flatpak#5084)
* Add --assert-userns-disabled option (needed for flatpak#5084)

Signed-off-by: Simon McVittie <[email protected]>
This lets us use its new features unconditionally.

Signed-off-by: Simon McVittie <[email protected]>
This feature (added in containers/bubblewrap#488)
allows us to improve the guarantees of disallowing the sandbox to use
recursive user namespaces (which is a security risk) compared to the
existing limits that use seccomp.

[smcv: Move this to flatpak_run_setup_base_argv() so it will apply
equally in apply_extra_data() and `flatpak build`; make the compile-time
check for a setuid bwrap into a runtime check]

Co-authored-by: Simon McVittie <[email protected]>
Signed-off-by: Simon McVittie <[email protected]>
Our seccomp filtering necessarily adds overhead to each system call,
which is undesirable for syscall-heavy workloads like graphically
intensive games.

This is currently incomplete. It depends on flatpak#5084, but also needs
solutions to:

- preventing ioctl TIOCSTI (CVE-2017-5226): at the moment this is done
  in a relatively crude way via bwrap --new-session

- preventing access to the kernel keyring (see also flatpak#4281): at the moment
  this is unsolved

Resolves: flatpak#4187
Signed-off-by: Simon McVittie <[email protected]>
@orowith2os
Copy link

A simple --allow option seems like too little for a feature this large, especially with how dangerous it is. Wouldn't it be better to need some confirmation, and a warning, possibly stated everywhere it's used?

@smcv
Copy link
Collaborator Author

smcv commented Apr 3, 2023

Wouldn't it be better to need some confirmation, and a warning, possibly stated everywhere it's used?

Can you propose an implementation? (I suspect that if you try, you'll find that a nice UX is impossible, because the only way the desired functionality can be implemented is to make the decision during app startup; but I'd be happy to be proved wrong on this.)

By the time it reaches a mergeable state, this feature is not actually as dangerous as you might think. We already have some permissions that are a complete sandbox escape (arbitrary code execution on the host system), used by applications that genuinely need that, like Flatseal and GNOME Builder. Compared with those, this one is actually relatively minor.

This feature is not intended to be a complete sandbox escape, because it depends on #5084, which is intended to remove the "impersonate another Flatpak or a non-Flatpak" exploit route without needing syscall filtering.

Preventing access to the kernel keyring is currently unsolved, and will need solving before this can be merged. Contributions welcome from people who understand the kernel keyring better than I do (which is almost anyone, I've never used that API).

The rest of what this does is removing mitigations (hardening) against things that would already be treated as a kernel vulnerability, which is obviously not wonderful from a security point of view - but if people consider our current sandbox to have an unacceptable impact on game performance, then the choice is between "run Flatpak Steam in a sandbox with imperfect hardening" (via this feature) and "run Steam from the .deb with no sandboxing at all", from which I'd certainly prefer the first one.

@Erick555
Copy link
Contributor

Erick555 commented Apr 3, 2023

I wonder if seccomp performance can be improved some way.

@orowith2os
Copy link

@smcv on the flatpak side of things, some major warnings when installing an app using this permission would be nice. And during runtime, if an app has this permission, the CLI would state it when starting up.

Flathub would apply the same limits to this as it does access to flatpak-spawn.

I was under the impression this just removed all syscall-related mitigations.

Though I agree with @Erick555 and think seccomp could be improved a bit too.

@gasinvein
Copy link
Member

some major warnings when installing an app using this permission would be nice

I see little point in this. Flatpak doesn't warn about much more dangerous permissions, like home directory or unrestricted dbus access. If anything, it probably should be marked as unsafe in store frontends, but this is out of scope of this PR.

@smcv
Copy link
Collaborator Author

smcv commented Apr 4, 2023

I wonder if seccomp performance can be improved some way.

Though I agree with @Erick555 and think seccomp could be improved a bit too.

Implementations gratefully received! I suspect it isn't possible to make syscall-heavy workloads much faster than the current status, because the major time cost is the kernel intercepting syscalls to check them against our filter - but I'd be delighted to be proved wrong by a tested pull request.

@Erick555
Copy link
Contributor

Erick555 commented Apr 4, 2023

The seccomp performance in kernel may differ regarding to how filters are created/stacked. The performance may also depend on libseccomp version. The solution may even lie outside of flatpak.

I think all current flatpak options are used to fix something that doesn't work without them. Introducing option that just promise to make something faster (from 0 to double digit %) may be very vague in terms of who and when should use it. Most users won't run any benchmarks and placebo usually sells strong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement needs work sandbox issue related to sandbox setup WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance impact of seccomp filter in games
6 participants