Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: SCMP_FLTATR_API_TSKIP does not seem to be used by tracer programs #368

Open
ManaSugi opened this issue Jan 30, 2022 · 6 comments
Open

Comments

@ManaSugi
Copy link
Contributor

Hello, I have a question about SCMP_FLTATR_API_TSKIP attribute.
SCMP_FLTATR_API_SKIP has been supported from dc87999 in order to address the #80 and the man page explains as follows:

A flag to specify if libseccomp should allow filter rules
to be created for the -1 syscall. The -1 syscall value
can be used by tracer programs to skip specific syscall
invocations, see seccomp(2) for more information.
Defaults to off ( value == 0).

However, I think tracer programs do not use SCMP_FLTATR_API_TSKIP to skip a syscall because the tracer skips a syscall by changing directly the register of syscall number as explained in seccomp(2), not using a seccomp filter.

Excerpt from SECCOMP_RET_TRACE section in seccomp(2):

The tracer can skip the system call by changing the system
call number to -1. Alternatively, the tracer can change
the system call requested by changing the system call to a
valid system call number. If the tracer asks to skip the
system call, then the system call will appear to return
the value that the tracer puts in the return value register.

Actually, the kernel will skip a syscall if the syscall number is set to -1 by a ptracer at the following point.
https://elixir.bootlin.com/linux/v5.16/source/kernel/seccomp.c#L1229
The ptracer can set the syscall value of -1 without SCMP_FLTATR_API_TSKIP because it just changes the register.

Hence, it does not seem to make sense to create a filter rule using a syscall value of -1. I'm sorry if I'm wrong, but I'm not sure why SCMP_FLTATR API_TSKIP was added.
Would you mind if I asked the use case of SCMP_FLTATR_API_TSKIP?

@ManaSugi
Copy link
Contributor Author

ManaSugi commented Feb 7, 2022

@pcmoore @drakenclimber I'd appreciate it if you could answer at your convenience.

@pcmoore
Copy link
Member

pcmoore commented Feb 14, 2022

Would you mind if I asked the use case of SCMP_FLTATR_API_TSKIP?

Well, the use case is exactly as you described in your posting above; it is intended to support process tracers :)

It has been several years since we made this change, so this reasoning may be wrong, but my recollection is that without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped.

Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)

@ManaSugi
Copy link
Contributor Author

@pcmoore Thank you for your comment.

without a "syscall == -1" allow filter rule, the seccomp filter would reject the syscall skip before the kernel got to the skip line you mentioned. The "syscall == -1" rule in the BPF filter isn't to force the syscall to be skipped, it is to allow the kernel processing to get to the point where the syscall can be skipped.
Of course if you have a reproducer which shows that this doesn't work this way anymore I think we would like to see it :)

I attached the reproducer which shows that a tracer program can skip a system call without a "syscall == -1" rule.
The ptrace_test.c is a simple reproducer that skips a getuid syscall using SECCOMP_RET_TRACE by changing the register.

ptrace_test.c
// Copyright 2022 Sony Group Corporation
//

#include <seccomp.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <sys/prctl.h>
#include <syscall.h>

int die (const char *msg) {
    perror(msg);
    exit(errno);
}

int child() {
    int rc = -1;
    scmp_filter_ctx ctx;

    prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);

    ctx = seccomp_init(SCMP_ACT_ALLOW);
    if (ctx == NULL)
        goto out;

    rc = seccomp_rule_add_exact(ctx, SCMP_ACT_TRACE(getpid()), SCMP_SYS(getuid), 0);
    if (rc < 0)
        goto out;

    rc = seccomp_load(ctx);
    if (rc < 0)
        goto out;

    // This should output -ENOSYS (-38) as syscall-enter-stop on x86
    printf("uid: %d\n", getuid());

out:
    seccomp_release(ctx);
    return -rc;

}

int main() {
    int pid;
    int rc;
    int status;
    struct user_regs_struct regs;

    pid = fork();
    switch(pid) {
        case -1:
            die("failed to fork");
        case 0:
            ptrace(PTRACE_TRACEME, 0, NULL, NULL);
            kill(getpid(), SIGSTOP);

            rc = child();
            if (rc < 0) {
                die("failed to execute child");
            }

            return 0;
    }

    waitpid(pid, &status, __WALL);

    ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACESECCOMP);
    ptrace(PTRACE_CONT, pid, NULL, NULL);

    while(1) {
        waitpid(pid, &status, __WALL);

        if (status >> 8 == (SIGTRAP | (PTRACE_EVENT_SECCOMP << 8))) {
            ptrace(PTRACE_GETREGS, pid, NULL, &regs);
            if (regs.orig_rax == SYS_getuid) {
                printf("caught getuid syscall\n");

                // Change the syscall number to -1 in order to skip the syscall
                regs.orig_rax = -1;
                ptrace(PTRACE_SETREGS, pid, NULL, &regs);
            }

        }

        if (WIFEXITED(status) || WIFSIGNALED(status)) {
            break;
        }

        ptrace(PTRACE_CONT, pid, NULL, NULL);
    }

    return 0;
}

I can observe that the kernel can get to the skip line as I mentioned earlier by setting probe point to https://elixir.bootlin.com/linux/v5.10/source/kernel/seccomp.c#L989 .

$ uname -a 
Linux xxxx 5.10.0-1057-oem #61-Ubuntu SMP Thu Jan 13 15:06:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 
                                                                                                                                                                                                                              
$ sudo perf probe --source=/usr/src/linux-oem-5.10-5.10.0 --add "__seccomp_filter:68 this_syscall"                                                                                                                                                    
Added new event:
  probe:__seccomp_filter_L68 (on __seccomp_filter:68 with this_syscall)

You can now use it in all perf tools, such as:

        perf record -e probe:__seccomp_filter_L68 -aR sleep 1

$ sudo perf record -e probe:__seccomp_filter_L68 -aR ./ptrace_test
caught getuid syscall
uid: -38
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.898 MB perf.data (1 samples) ]

$ sudo perf script
     ptrace_test 12337 [020]  2739.243594: probe:__seccomp_filter_L68: (ffffffffb639720e) this_syscall=-1

The tracer program outputs -38 which is -ENOSYS (syscall-enter-stop on x86) as the return value of getuid, and we can see that this_syscall is set to -1.

If you don't mind, could you look into the reproducer? Thank you.

@pcmoore
Copy link
Member

pcmoore commented Feb 16, 2022

Thanks for sending the reproducer and the additional information, we'll add this to the list of things to investigate further but it might take me some time to get back to this.

As a reminder, SCMP_FLTATR_API_TSKIP is disabled by default.

@drakenclimber
Copy link
Member

Interesting. I'm swamped at the moment as well, but I am definitely intrigued.

@ManaSugi
Copy link
Contributor Author

Thank you for considering review it. It would be helpful.

As a reminder, SCMP_FLTATR_API_TSKIP is disabled by default.

Yes, so I didn't enable the SCMP_FLTATR_API_TSKIP attribute in the reproducer to make sure that the kernel can skip the system call without the attribute.

@pcmoore pcmoore added this to the v2.6.0 milestone Mar 31, 2023
@pcmoore pcmoore modified the milestones: v2.6.0, v2.7.0 May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants