Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes with an AMD GPU with Mesa >= 19.3.4 and seccomp #3219

Closed
creideiki opened this issue Feb 11, 2020 · 32 comments · Fixed by #4375
Closed

Crashes with an AMD GPU with Mesa >= 19.3.4 and seccomp #3219

creideiki opened this issue Feb 11, 2020 · 32 comments · Fixed by #4375
Labels
bug Something isn't working
Milestone

Comments

@creideiki
Copy link
Contributor

Since not long ago (I unfortunately don't have exact dates or versions for when it happened, but I think it started with Firefox 72.0.2), Firefox hangs at startup under Firejail. This happens on two machines with AMD GPUs, but not on three others with Intel GPUs. All five systems are running up-to-date Gentoo Linux unstable.

Trying on a completely empty profile directory, Firefox gets a little bit through its startup:

Reading profile /etc/firejail/firefox.profile
Reading profile /etc/firejail/whitelist-usr-share-common.inc
Reading profile /etc/firejail/firefox-common.profile
Reading profile /etc/firejail/disable-common.inc
Reading profile /etc/firejail/disable-devel.inc
Reading profile /etc/firejail/disable-exec.inc
Reading profile /etc/firejail/disable-interpreters.inc
Reading profile /etc/firejail/disable-programs.inc
Reading profile /etc/firejail/whitelist-common.inc
Reading profile /etc/firejail/whitelist-var-common.inc
Warning: noroot option is not available
Parent pid 12102, child pid 12103
Warning: An abstract unix socket for session D-BUS might still be available. Use --net or remove unix from --protocol set.
Post-exec seccomp protector enabled
Seccomp list in: !chroot, check list: @default-keep, prelist: unknown,
Child process initialized in 117.95 ms
1581450106377   [email protected]     WARN    Loading extension '[email protected]': Reading manifest: Invalid extension permission: networkStatus

And then hangs. The process tree in the sandbox looks like this:

 ~ $ ps -A --forest -o pid,comm
  PID COMMAND
    1 firejail
    9 firefox
   57  \_ GPU Process

And all threads in the GPU process are hung:

 ~ # strace -f -p 12160
strace: Process 12160 attached with 3 threads
[pid 12163] futex(0x7f54e41feb78, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12162] epoll_wait(6,  <unfinished ...>
[pid 12160] restart_syscall(<... resuming interrupted read ...>^Cstrace: Process 12160 detached

As are the ones in the main Firefox process:

 ~ # strace -f -p 12112
strace: Process 12112 attached with 41 threads
[pid 12178] futex(0x7f6e45c41df8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12177] futex(0x7f6e45c41df8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12176] futex(0x7f6e45c41df8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12175] futex(0x7f6e45c41df8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12174] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12173] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12172] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12171] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12170] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12169] futex(0x7f6e45c416f0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12168] futex(0x7f6e3ed7e228, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12167] futex(0x7f6e3ed7e228, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12166] futex(0x7f6e3ed7e228, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12165] futex(0x7f6e3ed7e228, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12164] futex(0x7f6e3df29cf8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12159] futex(0x7f6e3f0f522c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12158] futex(0x7f6e3f0f5188, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12155] futex(0x7f6e3f0f41e8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12154] futex(0x7f6e3f0f4148, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12153] futex(0x7f6e3f9f9e08, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12152] futex(0x7f6e3f9f922c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12151] futex(0x7f6e3f9f8fa8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12150] futex(0x7f6e45b4f04c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12149] futex(0x7f6e3f9f8b48, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12148] futex(0x7f6e45c8d90c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12147] futex(0x7f6e45d6e6d8, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12129] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12128] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12127] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12126] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12125] futex(0x7f6e41a04648, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12124] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12123] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12122] futex(0x7f6e41a0464c, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12121] futex(0x7f6e41856a70, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12119] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 12118] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 12117] futex(0x7f6e518560e0, FUTEX_WAIT_PRIVATE, 0, NULL <unfinished ...>
[pid 12116] epoll_wait(9,  <unfinished ...>
[pid 12115] restart_syscall(<... resuming interrupted read ...> <unfinished ...>
[pid 12112] futex(0x7f6e3ea42140, FUTEX_WAIT_PRIVATE, 0, NULL^Cstrace: Process 12112 detached
 <detached ...>

I'm not going to be able to do any deeper debugging for the next couple of days, but if nobody else can reproduce it I'll start looking at older versions of Firefox and removing Firejail profile options this weekend.

firejail version 0.9.62

Compile time support:
        - AppArmor support is disabled
        - AppImage support is enabled
        - chroot support is enabled
        - file and directory whitelisting support is enabled
        - file transfer support is enabled
        - firetunnel support is disabled
        - networking support is enabled
        - overlayfs support is enabled
        - private-home support is enabled
        - seccomp-bpf support is enabled
        - user namespace support is enabled
        - X11 sandboxing support is disabled

sys-apps/firejail-0.9.62::gentoo was built with the following:
USE="-apparmor chroot -contrib -debug file-transfer globalcfg network overlayfs private-home seccomp suid -test userns -vim-syntax whitelist -x11" ABI_X86="(64)"
@creideiki
Copy link
Contributor Author

firejail --noprofile firefox works.

@creideiki
Copy link
Contributor Author

Firefox 73.0 hangs in the same way.

@Vincent43
Copy link
Collaborator

Please try with various --ignore flags like firejail --ignore=seccomp firefox, firejail --ignore=nogroups firefox, firejail --ignore=nonewprivs firefox and so on. You may also try multiple --ignore at once.

@leogx9r
Copy link

leogx9r commented Feb 12, 2020

Please try with various --ignore flags like firejail --ignore=seccomp firefox, firejail --ignore=nogroups firefox, firejail --ignore=nonewprivs firefox and so on. You may also try multiple --ignore at once.

I've confirmed that disabling seccomp via firejail --ignore=seccomp firefox successfully restores the old (intended) behavior.

Edit: Tested this with v73.0

@Vincent43
Copy link
Collaborator

@leogx9r you had same issue as OP related to amd gpu on gentoo?

@leogx9r
Copy link

leogx9r commented Feb 13, 2020

@leogx9r you had same issue as OP related to amd gpu on gentoo?

Same issue, different OS. I've experienced this on Arch Linux.

It works fine with NVIDIA GPUs so I'd imagine it may be a kernel bug or an updated package causing it as it only started happening recently, as in within the past week to two.

To further add on to this issue, when using seccomp, startup takes around 10-15 seconds on an SSD in contrast to just under 2 seconds and GPU compositing fails, falling back to basic CPU rendering.

You can check this via about:support -> Graphics -> Features -> Compositing: "Basic" instead of using WebRender.

@Ropid
Copy link

Ropid commented Feb 14, 2020

This behavior showed up for me in Arch right now with Mesa 19.3.4. It works fine if I downgrade Mesa packages to 19.3.3, so I'm thinking the problem is related to a change in Mesa 19.3.4.

@creideiki
Copy link
Contributor Author

creideiki commented Feb 14, 2020 via email

@rusty-snake
Copy link
Collaborator

Yes, it is blocked:
https://github.com/netblue30/firejail/blob/master/etc/templates/syscalls.txt#L36

f you start Firefox with firejail --ignore=seccomp '--seccomp=!kcmp,!chroot' firefox?

I don't think that this works, firejail '--seccomp=!kcmp' firefox should be enough to add the exception.

@creideiki
Copy link
Contributor Author

creideiki commented Feb 14, 2020 via email

@leogx9r
Copy link

leogx9r commented Feb 14, 2020

firejail --ignore=seccomp '--seccomp=!kcmp,!chroot' firefox?

This indeed fixes the issue with the latest mesa version.

@creideiki
Copy link
Contributor Author

I can confirm that firejail --ignore=seccomp '--seccomp=!kcmp,!chroot' firefox fixes my original problem as well, on Mesa 20.0.0_rc2.

@creideiki creideiki changed the title Firefox on AMD GPU hangs on start Firefox on AMD GPU hangs on start with Mesa >= 19.3.4 Feb 14, 2020
creideiki added a commit to creideiki/portage that referenced this issue Feb 15, 2020
Mesa 19.3.4 on AMDGPU started using kcmp(), which is blocked by
the default sandbox. Explicitly allow it here until Firejail
decides on a permanent solution.

Upstream bug: netblue30/firejail#3219
creideiki added a commit to creideiki/portage that referenced this issue Feb 15, 2020
Mesa 19.3.4 on AMDGPU started using kcmp(), which is blocked by
the default sandbox. Explicitly allow it here until Firejail
decides on a permanent solution.

Upstream bug: netblue30/firejail#3219
@Vincent43 Vincent43 added the bug Something isn't working label Feb 15, 2020
@creideiki
Copy link
Contributor Author

Looking at the Mesa source code, only the AMDGPU code calls kcmp() as of version 20.0.0. I'm not sure under what circumstances, though - I've tried some OpenGL games and applications, and the only one (besides Firefox) I've seen call kcmp() is VLC.

Would the best way forward be inserting seccomp !kcmp in any profiles where it is an actual problem, or removing kcmp from the default list of blocked syscalls?

@rusty-snake
Copy link
Collaborator

Possible all profiles without no3d are affected?

find no no3d
# Copyright © 2020 rusty-snake
#
# Permission to use, copy, modify, and distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

from os import listdir, readlink

for prg in listdir("/usr/local/bin"):
    if readlink("/usr/local/bin/" + prg) == "/usr/bin/firejail":
        with open(f"/etc/firejail/{prg}.profile") as prf:
            profile = list(prf)
            if "no3d\n" in profile:
                continue
            elif "# Redirect\n" in profile:
                if profile[-1][:7] != "include":
                    print("WARN: cound not find included profile for {prg}.profile")
                with open("/etc/firejail/" + profile[-1][8:-1]) as fd:
                    no3d = False
                    for line in fd:
                        if line == "no3d\n":
                            no3d = True
                    if not no3d:
                        print(f"no no3d in {prg}")
            else:
                print(f"no no3d in {prg}")

@rusty-snake
Copy link
Collaborator

FYI: #3267

@SkewedZeppelin
Copy link
Collaborator

I can reproduce this with many profiles under Fedora 32, which ships Mesa 20.0.1.

j39m added a commit to j39m/katowice that referenced this issue Mar 22, 2020
This change
*   reinstates firejail usage for Mozilla products and
*   changes the firejail invocation to use ``--ignore=seccomp.''

Not depicted in this change was the insertion of ``seccomp !kcmp'' in
firefox-common.profile, which by itself appears insufficient (allows the
window to draw and nothing else).

Issue in question is
netblue30/firejail#3219
@SkewedZeppelin
Copy link
Collaborator

Here is a hacky patch to use in the meantime
https://gist.github.com/SkewedZeppelin/300447ea70be8aef106b8d8602881134
A proper solution will need to be put in place
@smitsohu
@topimiettinen

@topimiettinen
Copy link
Collaborator

Instead of allowing kcmp(), would it work to make it return ENOSYS (or EPERM) instead? Manual page mentions that kcmp() is not always available (needs CONFIG_CHECKPOINT_RESTORE), so the drivers should handle that case.

Though if kcmp() is considered safe (comparison of resources of two processes owned by the same user does not seem very dangerous), I wouldn't mind if it was removed.

@Vincent43
Copy link
Collaborator

Vincent43 commented Mar 27, 2020

Instead of allowing kcmp(), would it work to make it return ENOSYS (or EPERM) instead?

It would be great to change seccomp filter to use EPERM/ENOSYS globally. I think KILL was proven unsustainable at this point and security difference is quite negligible. Moreover if we're going to allow syscalls that cause issues then KILL is less secure in the end.

@creideiki
Copy link
Contributor Author

Instead of allowing kcmp(), would it work to make it return ENOSYS (or EPERM) instead?

The problem with that is that the call was introduced to fix a memory leak: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3202

Manual page mentions that kcmp() is not always available (needs CONFIG_CHECKPOINT_RESTORE), so the drivers should handle that case.

Yes, this makes Mesa's use of it very weird, but I haven't had the time to raise that issue with them.

@topimiettinen
Copy link
Collaborator

It would be great to change seccomp filter to use EPERM/ENOSYS globally. I think KILL was proven unsustainable at this point and security difference is quite negligible. Moreover if we're going to allow syscalls that cause issues then KILL is less secure in the end.

Agreed, also systemd has made the change. I'll make a PR.

@topimiettinen
Copy link
Collaborator

See #3301.

glitsj16 added a commit that referenced this issue Jun 5, 2020
@SkewedZeppelin
Copy link
Collaborator

Even with EPERM this is not fixed.
Vanilla firejail at 821dd6c on Fedora 32 using AMDGPU graphics breaks many programs.
Firefox, Evolution, etc.

I am using https://gist.github.com/SkewedZeppelin/300447ea70be8aef106b8d8602881134 on my personal builds

@rusty-snake rusty-snake mentioned this issue Sep 2, 2020
4 tasks
@reinerh reinerh added this to the 0.9.64 milestone Sep 4, 2020
@rusty-snake rusty-snake changed the title Firefox on AMD GPU hangs on start with Mesa >= 19.3.4 Crashes with an AMD GPU with Mesa >= 19.3.4 and seccomp Oct 20, 2020
@kmk3
Copy link
Collaborator

kmk3 commented Oct 28, 2020

Even with EPERM this is not fixed. Vanilla firejail at 821dd6c on Fedora 32
using AMDGPU graphics breaks many programs. Firefox, Evolution, etc.

I am using
https://gist.github.com/SkewedZeppelin/300447ea70be8aef106b8d8602881134 on my
personal builds

I can confirm. Firejail 0.9.64 on Artix using AMDGPU breaks Steam (see #3267)
unless I override the default syscall whitelist with the syscall blacklist
suggested by @rusty-snake:

--seccomp.drop=@clock,@cpu-emulation,@debug,@module,@obsolete,@raw-io,@reboot,@swap,open_by_handle_at,name_to_handle_at,ioprio_set,ni_syscall,syslog,fanotify_init,add_key,request_key,mbind,migrate_pages,move_pages,keyctl,io_setup,io_destroy,io_getevents,io_submit,io_cancel,remap_file_pages,set_mempolicyvmsplice,umount,userfaultfd,acct,bpf,chroot,mount,nfsservctl,pivot_root,setdomainname,sethostname,umount2,vhangup

Good catch! I don't have one of my failing systems with me at the moment, but
glancing at the Mesa Git repo between 19.3.3 and 19.3.4 shows me
https://gitlab.freedesktop.org/mesa/mesa/commit/ed271a9c2f40f8ec881bf3e4568d35dbfcd9cf70
which introduced a call to `kcmp

For reference, this is what it looks like on 19.3.4 (it hasn't changed too much
as of 20.2.1):

$ git checkout mesa-19.3.4
HEAD is now at 7a3190eb918 VERSION: bump version for 19.3.4
$ grep -Fnr kcmp src/
src/util/os_file.c:37:#include <linux/kcmp.h>
src/util/os_file.c:140:   return syscall(SYS_kcmp, pid, pid, KCMP_FILE, fd1, fd2) == 0;
$ cat src/util/os_file.c
// ...
#if defined(__linux__)
// ...
bool
os_same_file_description(int fd1, int fd2)
{
   pid_t pid = getpid();

   return syscall(SYS_kcmp, pid, pid, KCMP_FILE, fd1, fd2) == 0;
}

#else
// ...

Looking at the Mesa source code, only the AMDGPU code calls kcmp() as of
version 20.0.0.

$ git checkout mesa-20.0.0
HEAD is now at 9abde3412d3 VERSION: bump for 20.0.0 release
$ grep -Fnr os_same_file_description src/
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:383:         if (os_same_file_description(sws_iter->fd, ws->fd)) {
src/util/os_file.h:39:os_same_file_description(int fd1, int fd2);
src/util/os_file.c:136:os_same_file_description(int fd1, int fd2)
src/util/os_file.c:155:os_same_file_description(int fd1, int fd2)

Indeed, but since 20.1.1 it seems that other drivers might also be affected:

$ git checkout mesa-20.1.1
HEAD is now at 127c2be9c53 VERSION: bump to 20.1.1 release
$ grep -Fnr os_same_file_description src/
src/gallium/drivers/iris/iris_bufmgr.c:1539:   int ret = os_same_file_description(drm_fd, bufmgr->fd);
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:375:         r = os_same_file_description(sws_iter->fd, ws->fd);
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:388:               os_log_message("amdgpu: os_same_file_description couldn't "
src/util/os_file.h:45:os_same_file_description(int fd1, int fd2);
src/util/os_file.c:140:os_same_file_description(int fd1, int fd2)
src/util/os_file.c:163:os_same_file_description(int fd1, int fd2)
src/mesa/drivers/dri/i965/brw_bufmgr.c:1642:   int ret = os_same_file_description(drm_fd, bufmgr->fd);

And the list appears to be increasing...

$ git checkout master
Already on 'master'
Your branch is up to date with 'origin/master'.
$ git log --oneline --no-decorate -n 1
483657de323 aco: use mubuf helper in select_gs_copy_shader
$ grep -Fnr os_same_file_description src/
src/gallium/drivers/iris/iris_bufmgr.c:1552:   int ret = os_same_file_description(drm_fd, bufmgr->fd);
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:379:         r = os_same_file_description(sws_iter->fd, ws->fd);
src/gallium/winsys/amdgpu/drm/amdgpu_winsys.c:392:               os_log_message("amdgpu: os_same_file_description couldn't "
src/gallium/winsys/etnaviv/drm/etnaviv_drm_winsys.c:66:   ret = os_same_file_description(fd1, fd2);
src/gallium/winsys/etnaviv/drm/etnaviv_drm_winsys.c:73:         fprintf(stderr, "os_same_file_description couldn't determine if "
src/util/os_file.h:51:os_same_file_description(int fd1, int fd2);
src/util/os_file.c:189:os_same_file_description(int fd1, int fd2)
src/util/os_file.c:212:os_same_file_description(int fd1, int fd2)
src/mesa/drivers/dri/i965/brw_bufmgr.c:1638:   int ret = os_same_file_description(drm_fd, bufmgr->fd);

Has anyone tested seccomp on i965/iris with Mesa >= 20.1.1?

@rusty-snake
Copy link
Collaborator

What about adding !kcmp to seccomp if no arg_no3d and a AMD-GPU is detected. no3d comes before seccomp in profiles.

@topimiettinen
Copy link
Collaborator

Wouldn't it be simpler to skip detecting AMD GPU and allow kcmp if there's no no3d, or just always allow kcmp? It can be added manually to profiles for extra hardening.

@rusty-snake
Copy link
Collaborator

just always allow kcmp?

Since I know that kcmp is used in chromiums ozone backend (wayland), what would be the drawback on this?

@smitsohu
Copy link
Collaborator

smitsohu commented Jan 6, 2021

Maybe there could be an option in firejail.config to automatically append syscalls to the default seccomp filter?

This way people could easily return to the current behaviour.

@rusty-snake
Copy link
Collaborator

Over year and we don't even have a hotfix ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.