Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seccomp: specify what must happen if a syscall can't be resolved #972

Open
flx42 opened this issue Jun 2, 2018 · 8 comments
Open

seccomp: specify what must happen if a syscall can't be resolved #972

flx42 opened this issue Jun 2, 2018 · 8 comments

Comments

@flx42
Copy link

flx42 commented Jun 2, 2018

The specification doesn't specify anything if a syscall can't be resolved (or if it's a "pseudo" syscall).

runc silently drop those entries:
https://github.com/opencontainers/runc/blob/ecd55a4135e0a26de884ce436442914f945b1e76/libcontainer/seccomp/seccomp_linux.go#L168-L173

This seems like a fairly strong assumption to make, since for runc it ultimately depends on the version of libseccomp you have. On ubuntu 18.04, I have libseccomp2=2.3.1-2.1ubuntu4, which doesn't seem to include the patch from @justincormack:
seccomp/libseccomp@d9102f1

For instance, the seccomp profile used by Docker is supposed to whitelist preadv2:
https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L226
But since my libseccomp is missing the patch, it won't work:

$ docker run -i ubuntu:18.04 sh -c '{ apt-get -qq update && apt-get install -y gcc >/dev/null 2>&1; } && gcc -D_GNU_SOURCE -xc - && /a.out' <<EOF
#include <unistd.h>
#include <sys/syscall.h>
#include <errno.h>
#include <stdio.h> 
#include <string.h>
int main() {
        errno = 0;
        long ret = syscall(__NR_preadv2, -1, NULL, 0, 0, 0);
        fprintf(stderr, "%ld\n%s\n", ret, strerror(errno));
}
EOF

-1
Operation not permitted

If I remove seccomp with --security-opt seccomp=unconfined, it "works" as expected:

-1
Bad file descriptor

preadv2 is obviously a toy example, but this would be a surprising behavior if defaultAction == SCMP_ACT_ALLOW and you want to blacklist a syscall that libseccomp doesn't know about, the syscall would be silently allowed as far as I can see.

@wking
Copy link
Contributor

wking commented Jun 2, 2018

My preference would be to specify, either directly or by reference, the syscalls runtimes MUST support. That would give us a clear lower bar, but this level of precision has eluded us before (for a similar previous example, see #755). As far as I can tell, the current maintainer reccomendation is to file a complaint with your runtime.

@flx42
Copy link
Author

flx42 commented Jun 2, 2018

#673 (comment)

then the runtime errors out. end-users investigate. complain to the runtime developers or write a fix. That kind-of linting and validation needs to be secondary to this spec.

Mmm, but here the runtime doesn't error out, it just silently skip those syscalls. Users can't complain since they won't know that it's happening.

And here you can't complain to your runtime developer, you have to complain to your distro to ask them to update the version of libseccomp to match the kernel they ship. Or, in some cases you will probably need to first PR libseccomp, wait for a new release, then ask your distro to update (or to backport your patch).

@wking
Copy link
Contributor

wking commented Jun 2, 2018

Mmm, but here the runtime doesn't error out, it just silently skip those syscalls.

You can complain to runc and ask them to error instead of silently ignoring unrecognized syscalls.

And here you can't complain to your runtime developer, you have to complain to your distro to ask them to update the version of libseccomp to match the kernel they ship. Or, in some cases you will probably need to first PR libseccomp, wait for a new release, then ask your distro to update (or to backport your patch).

These are all good for the ecosystem ;). And if you can't wait, a default block with a whitelist or a local runtime(-dependancy) patch are probably going to be your only choices regardless of what the spec says.

I'm still in favor of specifying a minimum set of runtime-supported values, because folks aiming to generate portable configs should be able to tell when they're heading out into extension territory. But that minimum bar is not going to magically protect you from kernel evolution.

@justincormack
Copy link
Contributor

With a whitelist it fails safe at least, so behaviour is at least conservative. If you define your rules as a blacklist it would fail open in that case.

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently you might consider first writing a new seccomp library for runc, a set of tests, and a new specification format as the libseccomp format is a subset of what the underlying code can do...

@wking
Copy link
Contributor

wking commented Jun 2, 2018

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently...

I have no problem with leaning on libseccomp. I'd just rather replace informative wording like:

A valid list of constants as of libseccomp v2.3.2 is shown below.

with normative wording like:

Runtimes MUST support syscalls defined in Linux 4.4 and MAY support additional syscalls.

and similarly for other properties. For things that are clearly documented by libseccomp (e.g. actions), we can either inline the required values (like we do now) or link to a specific version of the seccomp docs (e.g. here, not sure if they serve a HTML rendering of that anywhere...). For syscalls, the 4.4 floor is based on:

libseccomp $ git grep 'based on Linux' v2.3.2 -- src/*syscall*.c
v2.3.2:src/arch-aarch64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-arm-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64n32-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-ppc-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-ppc64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-s390-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-s390x-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x32-syscalls.c:/* NOTE: based on Linux 4.5-rc4 */
v2.3.2:src/arch-x86-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x86_64-syscalls.c:/* NOTE: based on Linux 4.9 */

x32 looks like the only thing blocking a bump to 4.9.

@flx42
Copy link
Author

flx42 commented Jun 2, 2018

The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently you might consider first writing a new seccomp library for runc, a set of tests, and a new specification format as the libseccomp format is a subset of what the underlying code can do...

Agreed, and that's why complaining to your runtime developer will be pointless (with good reason). But at least it should be mentioned in the spec that syscalls might be silently dropped. In my initial example, the containerized application is losing features compared to an bare-metal application. It could also affect performance.

Initially, I thought that failing to whitelist a syscall should not be fatal, but failing to block a syscall should be fatal. But maybe the syscall actually doesn't exist on the host kernel, and there is no easy way to differentiate those cases.

Runtimes MUST support syscalls defined in Linux 4.4 and MAY support additional syscalls.

This seems fine if we also add some wording about dropping unsupported syscalls.
This is probably going to be controversial but: could compliance be defined in terms of the capabilities of the system's libseccomp? It should support at least the same syscalls.
It doesn't solve this particular problem, but at least it would prevent runtimes from shipping/statically linking their own outdated libseccomp. I realize this would be difficult to enforce/check.

@wking
Copy link
Contributor

wking commented Jun 2, 2018

Initially, I thought that failing to whitelist a syscall should not be fatal, but failing to block a syscall should be fatal.

I think runtimes should always fail create in those cases, since they can't apply the configured settings (arguably already covered by this spec language). It would be up to the caller to figure out if they were ok skipping that syscall (in which case they'd remove the entry from their config and try again) or not (in which case they'd try to get a runtime built against a newer version of libseccomp or otherwise).

@cyphar
Copy link
Member

cyphar commented Jun 4, 2018

I think runtimes should always fail create in those cases, since they can't apply the configured settings (arguably already covered by this spec language).

I'm most in favour of this -- I don't like having a minimum kernel version or libseccomp requirement. The main problem with pinning to libseccomp is that runtimes don't need to use libseccomp and so compliance shouldn't really be based on a library. As for minimum kernel version, I think that this would require us to update the spec regularly with each LTS (or something like that) to keep it current which sort of breaks the slow release cycle idea.

So as above, I think failing create with an informative error message would be the best way of handling this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants