-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp filter breaks latest glibc (in fedora rawhide) by blocking clone3 with EPERM #42680
Comments
berrange
added a commit
to berrange/moby
that referenced
this issue
Jul 27, 2021
If no seccomp policy is requested, then the built-in default policy in dockerd applies. This has no rule for "clone3" defined, nor any default errno defined. So when runc receives the config it attempts to determine a default errno, using logic defined in its commit: opencontainers/runc@7a8d716 As explained in the above commit message, runc uses a heuristic to decide which errno to return by default: [quote] The solution applied here is to prepend a "stub" filter which returns -ENOSYS if the requested syscall has a larger syscall number than any syscall mentioned in the filter. The reason for this specific rule is that syscall numbers are (roughly) allocated sequentially and thus newer syscalls will (usually) have a larger syscall number -- thus causing our filters to produce -ENOSYS if the filter was written before the syscall existed. [/quote] Unfortunately clone3 appears to one of the edge cases that does not result in use of ENOSYS, instead ending up with the historical EPERM errno. Latest glibc (2.33.9000, in Fedora 35 rawhide) will attempt to use clone3 by default. If it sees ENOSYS then it will automatically fallback to using clone. Any other errno is treated as a fatal error. Thus when docker seccomp policy triggers EPERM from clone3, no fallback occurs and programs are thus unable to spawn threads. The clone3 syscall is much more complicated than clone, most notably its flags are not exposed as a directly argument any more. Instead they are hidden inside a struct. This means that seccomp filters are unable to apply policy based on values seen in flags. Thus we can't directly replicate the current "clone" filtering for "clone3". We can at least ensure "clone3" returns ENOSYS errno, to trigger fallback to "clone" at which point we can filter on flags. Fixes: moby#42680 Signed-off-by: Daniel P. Berrangé <[email protected]>
7 tasks
docker-jenkins
pushed a commit
to docker-archive/docker-ce
that referenced
this issue
Jul 30, 2021
If no seccomp policy is requested, then the built-in default policy in dockerd applies. This has no rule for "clone3" defined, nor any default errno defined. So when runc receives the config it attempts to determine a default errno, using logic defined in its commit: opencontainers/runc@7a8d716 As explained in the above commit message, runc uses a heuristic to decide which errno to return by default: [quote] The solution applied here is to prepend a "stub" filter which returns -ENOSYS if the requested syscall has a larger syscall number than any syscall mentioned in the filter. The reason for this specific rule is that syscall numbers are (roughly) allocated sequentially and thus newer syscalls will (usually) have a larger syscall number -- thus causing our filters to produce -ENOSYS if the filter was written before the syscall existed. [/quote] Unfortunately clone3 appears to one of the edge cases that does not result in use of ENOSYS, instead ending up with the historical EPERM errno. Latest glibc (2.33.9000, in Fedora 35 rawhide) will attempt to use clone3 by default. If it sees ENOSYS then it will automatically fallback to using clone. Any other errno is treated as a fatal error. Thus when docker seccomp policy triggers EPERM from clone3, no fallback occurs and programs are thus unable to spawn threads. The clone3 syscall is much more complicated than clone, most notably its flags are not exposed as a directly argument any more. Instead they are hidden inside a struct. This means that seccomp filters are unable to apply policy based on values seen in flags. Thus we can't directly replicate the current "clone" filtering for "clone3". We can at least ensure "clone3" returns ENOSYS errno, to trigger fallback to "clone" at which point we can filter on flags. Fixes: moby/moby#42680 Signed-off-by: Daniel P. Berrangé <[email protected]> Upstream-commit: 9f6b562dd12ef7b1f9e2f8e6f2ab6477790a6594 Component: engine
mrc0mmand
added a commit
to mrc0mmand/restraint
that referenced
this issue
Aug 25, 2021
Current Docker version on Ubuntu 20.04 used by GH Actions suffers from an incompatibility with newer glibc [0] used by Fedora Rawhide, causing Rawhide containers in CI to fail with: ``` Errors during downloading metadata for repository 'fedora-cisco-openh264': - Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=fedora-cisco-openh264-rawhide&arch=x86_64 [getaddrinfo() thread failed to start] ``` glibc 2.34 and later tries to use the clone3 syscall (for hardware-assisted security hardening on x86_64), and falls back to clone2 on ENOSYS. However, with the current seccomp profile Docker returns EPERM instead, which is considered a "hard" fail. A fix [1] has been merged in upstream, but until then let's run the CI Docker containers without any seccomp profiles to allow Rawhide jobs to to their job. (I tried to disable seccomp only for the Rawhide jobs, but I couldn't procure any solution which wouldn't make my eyes bleed...) [0] moby/moby#42680 [1] moby/moby#42681
UncombedCoconut
added a commit
to naev/naev-infrastructure
that referenced
this issue
Aug 29, 2021
tonistiigi
pushed a commit
to tonistiigi/docker
that referenced
this issue
Sep 28, 2021
If no seccomp policy is requested, then the built-in default policy in dockerd applies. This has no rule for "clone3" defined, nor any default errno defined. So when runc receives the config it attempts to determine a default errno, using logic defined in its commit: opencontainers/runc@7a8d716 As explained in the above commit message, runc uses a heuristic to decide which errno to return by default: [quote] The solution applied here is to prepend a "stub" filter which returns -ENOSYS if the requested syscall has a larger syscall number than any syscall mentioned in the filter. The reason for this specific rule is that syscall numbers are (roughly) allocated sequentially and thus newer syscalls will (usually) have a larger syscall number -- thus causing our filters to produce -ENOSYS if the filter was written before the syscall existed. [/quote] Unfortunately clone3 appears to one of the edge cases that does not result in use of ENOSYS, instead ending up with the historical EPERM errno. Latest glibc (2.33.9000, in Fedora 35 rawhide) will attempt to use clone3 by default. If it sees ENOSYS then it will automatically fallback to using clone. Any other errno is treated as a fatal error. Thus when docker seccomp policy triggers EPERM from clone3, no fallback occurs and programs are thus unable to spawn threads. The clone3 syscall is much more complicated than clone, most notably its flags are not exposed as a directly argument any more. Instead they are hidden inside a struct. This means that seccomp filters are unable to apply policy based on values seen in flags. Thus we can't directly replicate the current "clone" filtering for "clone3". We can at least ensure "clone3" returns ENOSYS errno, to trigger fallback to "clone" at which point we can filter on flags. Fixes: moby#42680 Signed-off-by: Daniel P. Berrangé <[email protected]> (cherry picked from commit 9f6b562)
ssssam
referenced
this issue
in flatpak/flatpak
Oct 9, 2021
clone3() can be used to implement clone() with CLONE_NEWUSER, allowing a sandboxed process to get CAP_SYS_ADMIN in a new namespace and manipulate its root directory. We need to block this so that AF_UNIX-based socket servers (X11, Wayland, etc.) can rely on /proc/PID/root/.flatpak-info existing for all Flatpak-sandboxed apps. Partially fixes GHSA-67h7-w3jq-vh4q. Thanks: an anonymous reporter Signed-off-by: Simon McVittie <[email protected]>
This was referenced Oct 15, 2021
fishilico
added a commit
to fishilico/shared
that referenced
this issue
Oct 31, 2021
Recently, glibc broke with seccomp again: syscall "clone3" is used by glibc 2.34 moby/moby#42680 This was fixed in Docker 20.10.10, moby/moby#42836 which was packaged in Arch Linux in October 2021
This was referenced Jan 13, 2022
lrtfm
added a commit
to lrtfm/firedrake-dockerfile
that referenced
this issue
May 11, 2023
domq
pushed a commit
to epfl-si/wp-ops
that referenced
this issue
Jun 14, 2023
- Because [some morons don't know the difference between `EPERM` and `ENOSYS`](moby/moby#42680), we have to change our `wp-receptor` build strategy from (no pun intended) building on top of the `receptor` Docker image, to using `ghcr.io/ansible/awx` ; as the latter at least has git already installed - To add the `receptor` binary (self-contained, thanks Golang!) on top, use `{{ shellmacro_poor_mans_curl }}` twice in a shell pipeline, yow! to query the GitHub API, find the suitable binary relase of `ansible/receptor`, download and untar it As a further upside of this change, we are no longer pinned to v1.3.0 for reasons of Docker image format.
domq
pushed a commit
to epfl-si/wp-ops
that referenced
this issue
Jun 16, 2023
- Because [some morons don't know the difference between `EPERM` and `ENOSYS`](moby/moby#42680), we have to change our `wp-receptor` build strategy from (no pun intended) building on top of the `receptor` Docker image, to using `ghcr.io/ansible/awx` ; as the latter at least has git already installed - To add the `receptor` binary (self-contained, thanks Golang!) on top, use `{{ shellmacro_poor_mans_curl }}` twice in a shell pipeline, yow! to query the GitHub API, find the suitable binary relase of `ansible/receptor`, download and untar it As a further upside of this change, we are no longer pinned to v1.3.0 for reasons of Docker image format.
domq
pushed a commit
to epfl-si/wp-ops
that referenced
this issue
Jun 17, 2023
- Because [some morons don't know the difference between `EPERM` and `ENOSYS`](moby/moby#42680), we have to change our `wp-receptor` build strategy from (no pun intended) building on top of the `receptor` Docker image, to using `ghcr.io/ansible/awx` ; as the latter at least has git already installed. - To add the `receptor` binary (self-contained, thanks Golang!) on top, introduce `{{ shellmacro_poor_mans_curl }}` and use it twice in a shell pipeline, yow! to query the GitHub API, find the suitable binary relase of `ansible/receptor`, download and untar it. As a further upside of this change, we are no longer pinned to receptor v1.3.0 for reasons of Docker image format.
domq
pushed a commit
to epfl-si/wp-ops
that referenced
this issue
Jun 17, 2023
- Because [some morons don't know the difference between `EPERM` and `ENOSYS`](moby/moby#42680), we have to change our `wp-receptor` build strategy from (no pun intended) building on top of the `receptor` Docker image, to using `ghcr.io/ansible/awx` ; as the latter at least has git already installed. - To add the `receptor` binary (self-contained, thanks Golang!) on top, introduce `{{ shellmacro_poor_mans_curl }}` and use it twice in a shell pipeline, yow! to query the GitHub API, find the suitable binary relase of `ansible/receptor`, download and untar it. As a further upside of this change, we are no longer pinned to receptor v1.3.0 for reasons of Docker image format.
This comment was marked as spam.
This comment was marked as spam.
$ docker run -it registry.fedoraproject.org/fedora:rawhide curl google.com |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
I have a docker built with seccomp running on Fedora 34 host. Attempting to run commands inside a container with the registry.fedoraproject.org/fedora:rawhide image results in programs failing to fork processes.
eg
Tracing the container "curl" process I can see
The latest glibc now attempts to use 'clone3' by default. For backwards compatibility it will look for ENOSYS errno and fallback to "clone". The EPERM errno meanwhile is treated as a fatal error.
The default seccomp filter installed by docker is causing EPERM and so this breaks the glibc fallback.
Explicitly passing the default seccomp profile config makes it work, despite not allowing clone3
Tracing again shows clone3 now returns ENOSYS
I expect this difference in behaviour is as a result of the heuristics implemented for choosing EPERM vs ENOSYS in runc with opencontainers/runc@7a8d716
Also it is impossible to run
docker build
and seccomp can't be overriden to make it work
Steps to reproduce the issue:
Describe the results you received:
curl: (6) getaddrinfo() thread failed to start
Describe the results you expected:
Dump of google.com
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
Virtual machine running Fedora 35 VM. Also seen in GitLab CI when using 'docker:dind' for builds
The text was updated successfully, but these errors were encountered: