seccomp: specify what must happen if a syscall can't be resolved #972

flx42 · 2018-06-02T01:32:07Z

The specification doesn't specify anything if a syscall can't be resolved (or if it's a "pseudo" syscall).

runc silently drop those entries:
https://github.com/opencontainers/runc/blob/ecd55a4135e0a26de884ce436442914f945b1e76/libcontainer/seccomp/seccomp_linux.go#L168-L173

This seems like a fairly strong assumption to make, since for runc it ultimately depends on the version of libseccomp you have. On ubuntu 18.04, I have libseccomp2=2.3.1-2.1ubuntu4, which doesn't seem to include the patch from @justincormack:
seccomp/libseccomp@d9102f1

For instance, the seccomp profile used by Docker is supposed to whitelist preadv2:
https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L226
But since my libseccomp is missing the patch, it won't work:

$ docker run -i ubuntu:18.04 sh -c '{ apt-get -qq update && apt-get install -y gcc >/dev/null 2>&1; } && gcc -D_GNU_SOURCE -xc - && /a.out' <<EOF
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <stdio.h> 
#include <string.h>
int main() {
        errno = 0;
        long ret = syscall(__NR_preadv2, -1, NULL, 0, 0, 0);
        fprintf(stderr, "%ld\n%s\n", ret, strerror(errno));
}
EOF

-1
Operation not permitted

If I remove seccomp with --security-opt seccomp=unconfined, it "works" as expected:

-1
Bad file descriptor

preadv2 is obviously a toy example, but this would be a surprising behavior if defaultAction == SCMP_ACT_ALLOW and you want to blacklist a syscall that libseccomp doesn't know about, the syscall would be silently allowed as far as I can see.

The text was updated successfully, but these errors were encountered:

wking · 2018-06-02T03:46:49Z

My preference would be to specify, either directly or by reference, the syscalls runtimes MUST support. That would give us a clear lower bar, but this level of precision has eluded us before (for a similar previous example, see #755). As far as I can tell, the current maintainer reccomendation is to file a complaint with your runtime.

flx42 · 2018-06-02T04:16:34Z

#673 (comment)

then the runtime errors out. end-users investigate. complain to the runtime developers or write a fix. That kind-of linting and validation needs to be secondary to this spec.

Mmm, but here the runtime doesn't error out, it just silently skip those syscalls. Users can't complain since they won't know that it's happening.

And here you can't complain to your runtime developer, you have to complain to your distro to ask them to update the version of libseccomp to match the kernel they ship. Or, in some cases you will probably need to first PR libseccomp, wait for a new release, then ask your distro to update (or to backport your patch).

wking · 2018-06-02T04:43:00Z

Mmm, but here the runtime doesn't error out, it just silently skip those syscalls.

You can complain to runc and ask them to error instead of silently ignoring unrecognized syscalls.

And here you can't complain to your runtime developer, you have to complain to your distro to ask them to update the version of libseccomp to match the kernel they ship. Or, in some cases you will probably need to first PR libseccomp, wait for a new release, then ask your distro to update (or to backport your patch).

These are all good for the ecosystem ;). And if you can't wait, a default block with a whitelist or a local runtime(-dependancy) patch are probably going to be your only choices regardless of what the spec says.

I'm still in favor of specifying a minimum set of runtime-supported values, because folks aiming to generate portable configs should be able to tell when they're heading out into extension territory. But that minimum bar is not going to magically protect you from kernel evolution.

justincormack · 2018-06-02T10:36:06Z

With a whitelist it fails safe at least, so behaviour is at least conservative. If you define your rules as a blacklist it would fail open in that case.

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently you might consider first writing a new seccomp library for runc, a set of tests, and a new specification format as the libseccomp format is a subset of what the underlying code can do...

wking · 2018-06-02T15:10:57Z

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently...

I have no problem with leaning on libseccomp. I'd just rather replace informative wording like:

A valid list of constants as of libseccomp v2.3.2 is shown below.

with normative wording like:

Runtimes MUST support syscalls defined in Linux 4.4 and MAY support additional syscalls.

and similarly for other properties. For things that are clearly documented by libseccomp (e.g. actions), we can either inline the required values (like we do now) or link to a specific version of the seccomp docs (e.g. here, not sure if they serve a HTML rendering of that anywhere...). For syscalls, the 4.4 floor is based on:

libseccomp $ git grep 'based on Linux' v2.3.2 -- src/*syscall*.c
v2.3.2:src/arch-aarch64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-arm-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64n32-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-ppc-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-ppc64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-s390-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-s390x-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x32-syscalls.c:/* NOTE: based on Linux 4.5-rc4 */
v2.3.2:src/arch-x86-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x86_64-syscalls.c:/* NOTE: based on Linux 4.9 */

x32 looks like the only thing blocking a bump to 4.9.

flx42 · 2018-06-02T20:53:32Z

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently you might consider first writing a new seccomp library for runc, a set of tests, and a new specification format as the libseccomp format is a subset of what the underlying code can do...

Agreed, and that's why complaining to your runtime developer will be pointless (with good reason). But at least it should be mentioned in the spec that syscalls might be silently dropped. In my initial example, the containerized application is losing features compared to an bare-metal application. It could also affect performance.

Initially, I thought that failing to whitelist a syscall should not be fatal, but failing to block a syscall should be fatal. But maybe the syscall actually doesn't exist on the host kernel, and there is no easy way to differentiate those cases.

Runtimes MUST support syscalls defined in Linux 4.4 and MAY support additional syscalls.

This seems fine if we also add some wording about dropping unsupported syscalls.
This is probably going to be controversial but: could compliance be defined in terms of the capabilities of the system's libseccomp? It should support at least the same syscalls.
It doesn't solve this particular problem, but at least it would prevent runtimes from shipping/statically linking their own outdated libseccomp. I realize this would be difficult to enforce/check.

wking · 2018-06-02T21:22:05Z

Initially, I thought that failing to whitelist a syscall should not be fatal, but failing to block a syscall should be fatal.

I think runtimes should always fail create in those cases, since they can't apply the configured settings (arguably already covered by this spec language). It would be up to the caller to figure out if they were ok skipping that syscall (in which case they'd remove the entry from their config and try again) or not (in which case they'd try to get a runtime built against a newer version of libseccomp or otherwise).

cyphar · 2018-06-04T08:58:43Z

I think runtimes should always fail create in those cases, since they can't apply the configured settings (arguably already covered by this spec language).

I'm most in favour of this -- I don't like having a minimum kernel version or libseccomp requirement. The main problem with pinning to libseccomp is that runtimes don't need to use libseccomp and so compliance shouldn't really be based on a library. As for minimum kernel version, I think that this would require us to update the spec regularly with each LTS (or something like that) to keep it current which sort of breaks the slow release cycle idea.

So as above, I think failing create with an informative error message would be the best way of handling this.

flx42 mentioned this issue Jun 5, 2018

seccomp support is now too strict lxc/lxc#2376

Closed

giuseppe mentioned this issue Apr 11, 2019

OCI runtime create failed: seccomp_rule_add: unknown. containers/crun#31

Closed

thaJeztah mentioned this issue Dec 1, 2020

Creating seccomp filter for unnamed syscall entries moby/moby#41671

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seccomp: specify what must happen if a syscall can't be resolved #972

seccomp: specify what must happen if a syscall can't be resolved #972

flx42 commented Jun 2, 2018

wking commented Jun 2, 2018

flx42 commented Jun 2, 2018 •

edited

Loading

wking commented Jun 2, 2018

justincormack commented Jun 2, 2018

wking commented Jun 2, 2018 •

edited

Loading

flx42 commented Jun 2, 2018

wking commented Jun 2, 2018

cyphar commented Jun 4, 2018 •

edited

Loading

seccomp: specify what must happen if a syscall can't be resolved #972

seccomp: specify what must happen if a syscall can't be resolved #972

Comments

flx42 commented Jun 2, 2018

wking commented Jun 2, 2018

flx42 commented Jun 2, 2018 • edited Loading

wking commented Jun 2, 2018

justincormack commented Jun 2, 2018

wking commented Jun 2, 2018 • edited Loading

flx42 commented Jun 2, 2018

wking commented Jun 2, 2018

cyphar commented Jun 4, 2018 • edited Loading

flx42 commented Jun 2, 2018 •

edited

Loading

wking commented Jun 2, 2018 •

edited

Loading

cyphar commented Jun 4, 2018 •

edited

Loading