Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libstore/local-derivation-goal: prohibit creating setuid/setgid binaries #10501

Merged
merged 2 commits into from
Apr 18, 2024

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Apr 14, 2024

Motivation

With Linux kernel >=6.6 & glibc 2.39 a fchmodat2(2) is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
which slips through

/* Prevent builders from creating setuid/setgid binaries. */
for (int perm : { S_ISUID, S_ISGID }) {
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(chmod), 1,
SCMP_A1(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmod), 1,
SCMP_A1(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
if (seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EPERM), SCMP_SYS(fchmodat), 1,
SCMP_A2(SCMP_CMP_MASKED_EQ, (scmp_datum_t) perm, (scmp_datum_t) perm)) != 0)
throw SysError("unable to add seccomp rule");
}
.
The fixupPhase then uses fchmodat, which fails.
With older kernel or glibc, setting the suid bit fails in the
install phase, which is not treated as fatal, and then the
fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the fchmodat2
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via __NR_fchmodat2 / __SNR_fchmodat2, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

$ rg --pcre2 'define __NR_fchmodat2 (?!452)'
sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
58:#define __NR_fchmodat2 1073742276

sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
67:#define __NR_fchmodat2 6452

sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
62:#define __NR_fchmodat2 5452

sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
70:#define __NR_fchmodat2 4452

sysdeps/unix/sysv/linux/alpha/arch-syscall.h
59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
pkgs/tools/package-management/nix/common.nix & then built a VM from
the following config using my dirty nixpkgs master:

{
  vm = { pkgs, ... }: {
    virtualisation.writableStore = true;
    virtualisation.memorySize = 8192;
    virtualisation.diskSize = 12 * 1024;
    nix.package = pkgs.nixVersions.nix_2_21;
  };
}

The original issue can be triggered via

nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
  --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes #10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)

cc @NixOS/nix-team
cc @lf- (we'll probably want a follow-up ticket for the open(2) topic btw, right?)

Context

Priorities and Process

This patch should be backported to all Nix versions that are considered supported.

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@Ma27
Copy link
Member Author

Ma27 commented Apr 14, 2024

fwiw hacked together a dirty, but quicker way to test this:

with import (builtins.fetchTarball https://github.com/nixos/nixpkgs/archive/nixos-unstable.tar.gz) {};
let
  builder = runCommand "fnord" { } ''
    ${pkgs.gcc}/bin/gcc ${writeText "foo.c" ''
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/stat.h>
      #include <sys/syscall.h>
      #include <errno.h>

      int main(void) {
          char *name = getenv("OUT_FROM_ENV");
          FILE *fd = fopen(name, "w");
          fprintf(fd, "hello");
          fclose(fd);
          long rs = syscall(__NR_fchmodat2, NULL, name, S_ISUID, 0);
          printf("result: %ld, errno: %ld\n", rs, errno);
      }
    ''} -O0 -g -o $out
  '';
in
runCommand "foobar" { } ''
  OUT_FROM_ENV=$out ${builder}
''

nix-build on a VM with this patch gives me result: -1, errno: 1 again (in contrast to errno being 0 w/o tthis patch).

src/libstore/build/local-derivation-goal.cc Outdated Show resolved Hide resolved
@Ma27 Ma27 force-pushed the seccomp-fchmodat2 branch 2 times, most recently from 2fb7677 to e41aeb4 Compare April 15, 2024 07:31
@Ma27
Copy link
Member Author

Ma27 commented Apr 15, 2024

@Ericson2314 that's what I tried first 😅

@edolstra edolstra added the backport 2.21-maintenance Automatically creates a PR against the branch label Apr 17, 2024
@edolstra
Copy link
Member

The build is currently failing with

       > src/libstore/build/local-derivation-goal.cc:41:11: fatal error: fchmodat2-compat.hh: No such file or directory
       >    41 | # include "fchmodat2-compat.hh"

libstore/linux should be added to the include path somewhere.

@Ericson2314
Copy link
Member

src/libstore/local.mk may need to be updated for it.

Ma27 and others added 2 commits April 18, 2024 12:20
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
and seccomp 2.5.5 are needed to have the correct syscall number available
via `__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I tested the change by adding the diff below as patch to
`pkgs/tools/package-management/nix/common.nix` & then built a VM from
the following config using my dirty nixpkgs master:

    {
      vm = { pkgs, ... }: {
        virtualisation.writableStore = true;
        virtualisation.memorySize = 8192;
        virtualisation.diskSize = 12 * 1024;
        nix.package = pkgs.nixVersions.nix_2_21;
      };
    }

The original issue can be triggered via

    nix build -L github:nixos/nixpkgs/d6dc19adbda4fd92fe9a332327a8113eaa843894#lxc \
      --extra-experimental-features 'nix-command flakes'

however the problem disappears with this patch applied.

Closes NixOS#10424

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
The linux dirs are conditionally added to the `-I` path.
@Ma27
Copy link
Member Author

Ma27 commented Apr 18, 2024

Rebased onto latest master. cc @Ericson2314

@Ericson2314
Copy link
Member

Oh good, I forgot to say I did the thing that was needed for this, but you found it :)

@Ericson2314 Ericson2314 merged commit b2b776d into NixOS:master Apr 18, 2024
9 checks passed
Copy link

Backport failed for 2.21-maintenance, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.21-maintenance
git worktree add -d .worktree/backport-10501-to-2.21-maintenance origin/2.21-maintenance
cd .worktree/backport-10501-to-2.21-maintenance
git switch --create backport-10501-to-2.21-maintenance
git cherry-pick -x ba6804518772e6afb403dd55478365d4b863c854 fb9f4208ed371afe23307f9b6cb9ada618bada9b

@Ma27 Ma27 deleted the seccomp-fchmodat2 branch April 18, 2024 14:14
@thufschmitt
Copy link
Member

@Ericson2314 @Ma27 will either of you handle the backport to 2.21 (and 2.18) since the automatic one failed?

@vcunat
Copy link
Member

vcunat commented Apr 22, 2024

Could this merge be causing some issues? (noticed by accident)
https://hydra.nixos.org/build/257144328/nixlog/7/tail

@Ericson2314
Copy link
Member

Ericson2314 commented Apr 22, 2024

@thufschmitt Yes I can do that. (I will do once the regression is dealt with.) oh maybe that is what you all did?

@Ericson2314
Copy link
Member

@vcunat You are right: #10585

@Ma27
Copy link
Member Author

Ma27 commented Apr 22, 2024

@Ericson2314 apologies for not responding here!
Will do the backport as soon as #10585 is sorted out.

@Ericson2314
Copy link
Member

OK Thanks @Ma27!

@Ericson2314
Copy link
Member

FWIW, my view is that it is good to also backport the changes needed for the linux dir, etc., as that will make future backports to 2.21 go smoothly.

@Ma27
Copy link
Member Author

Ma27 commented Apr 22, 2024

fyi Will provide a backport of both this and #10591 as soon as this one was reviewed :)

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-04-22-nix-team-meeting-minutes-140/44016/1

lf- pushed a commit to lix-project/lix that referenced this pull request May 4, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.

Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.

To work around that

* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
  scheduled[3].

* the package now uses libseccomp 2.5.5 on its own already. This is to
  provide a quick fix since the correct fix for 23.11 is still a staging cycle
  away.

We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷

Relevant PRs in CppNix:

* NixOS/nix#10591
* NixOS/nix#10501

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
[3] NixOS/nixpkgs#306070

(cherry picked from commit ba68045)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
lf- pushed a commit to lix-project/lix that referenced this pull request May 4, 2024
With Linux kernel >=6.6 & glibc 2.39 a `fchmodat2(2)` is available that
isn't filtered away by the libseccomp sandbox.

Being able to use this to bypass that restriction has surprising results
for some builds such as lxc[1]:

> With kernel ≥6.6 and glibc 2.39, lxc's install phase uses fchmodat2,
> which slips through https://github.com/NixOS/nix/blob/9b88e5284608116b7db0dbd3d5dd7a33b90d52d7/src/libstore/build/local-derivation-goal.cc#L1650-L1663.
> The fixupPhase then uses fchmodat, which fails.
> With older kernel or glibc, setting the suid bit fails in the
> install phase, which is not treated as fatal, and then the
> fixup phase does not try to set it again.

Please note that there are still ways to bypass this sandbox[2] and this is
mostly a fix for the breaking builds.

This change works by creating a syscall filter for the `fchmodat2`
syscall (number 452 on most systems). The problem is that glibc 2.39
is needed to have the correct syscall number available via
`__NR_fchmodat2` / `__SNR_fchmodat2`, but this flake is still on
nixpkgs 23.11. To have this change everywhere and not dependent on the
glibc this package is built against, I added a header
"fchmodat2-compat.hh" that sets the syscall number based on the
architecture. On most platforms its 452 according to glibc with a few
exceptions:

    $ rg --pcre2 'define __NR_fchmodat2 (?!452)'
    sysdeps/unix/sysv/linux/x86_64/x32/arch-syscall.h
    58:#define __NR_fchmodat2 1073742276

    sysdeps/unix/sysv/linux/mips/mips64/n32/arch-syscall.h
    67:#define __NR_fchmodat2 6452

    sysdeps/unix/sysv/linux/mips/mips64/n64/arch-syscall.h
    62:#define __NR_fchmodat2 5452

    sysdeps/unix/sysv/linux/mips/mips32/arch-syscall.h
    70:#define __NR_fchmodat2 4452

    sysdeps/unix/sysv/linux/alpha/arch-syscall.h
    59:#define __NR_fchmodat2 562

I added a small regression-test to the setuid integration-test that
attempts to set the suid bit on a file using the fchmodat2 syscall.
I confirmed that the test fails without the change in
local-derivation-goal.

Additionally, we require libseccomp 2.5.5 or greater now: as it turns
out, libseccomp maintains an internal syscall table and
validates each rule against it. This means that when using libseccomp
2.5.4 or older, one may pass `452` as syscall number against it, but
since it doesn't exist in the internal structure, `libseccomp` will refuse
to create a filter for that. This happens with nixpkgs-23.11, i.e. on
stable NixOS and when building Lix against the project's flake.

To work around that

* a backport of libseccomp 2.5.5 on upstream nixpkgs has been
  scheduled[3].

* the package now uses libseccomp 2.5.5 on its own already. This is to
  provide a quick fix since the correct fix for 23.11 is still a staging cycle
  away.

We still need the compat header though since `SCMP_SYS(fchmodat2)`
internally transforms this into `__SNR_fchmodat2` which points to
`__NR_fchmodat2` from glibc 2.39, so it wouldn't build on glibc 2.38.
The updated syscall table from libseccomp 2.5.5 is NOT used for that
step, but used later, so we need both, our compat header and their
syscall table 🤷

Relevant PRs in CppNix:

* NixOS/nix#10591
* NixOS/nix#10501

[1] NixOS/nixpkgs#300635 (comment)
[2] NixOS/nixpkgs#300635 (comment)
[3] NixOS/nixpkgs#306070

(cherry picked from commit ba68045)
Change-Id: I6921ab5a363188c6bff617750d00bb517276b7fe
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/security-fix-nix-derivation-sandbox-escape/47778/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.21-maintenance Automatically creates a PR against the branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Syscall fchmodat2 allows to create suid binaries
8 participants