-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp: specify what must happen if a syscall can't be resolved #972
Comments
My preference would be to specify, either directly or by reference, the syscalls runtimes MUST support. That would give us a clear lower bar, but this level of precision has eluded us before (for a similar previous example, see #755). As far as I can tell, the current maintainer reccomendation is to file a complaint with your runtime. |
Mmm, but here the runtime doesn't error out, it just silently skip those syscalls. Users can't complain since they won't know that it's happening. And here you can't complain to your runtime developer, you have to complain to your distro to ask them to update the version of |
You can complain to runc and ask them to error instead of silently ignoring unrecognized syscalls.
These are all good for the ecosystem ;). And if you can't wait, a default block with a whitelist or a local runtime(-dependancy) patch are probably going to be your only choices regardless of what the spec says. I'm still in favor of specifying a minimum set of runtime-supported values, because folks aiming to generate portable configs should be able to tell when they're heading out into extension territory. But that minimum bar is not going to magically protect you from kernel evolution. |
With a whitelist it fails safe at least, so behaviour is at least conservative. If you define your rules as a blacklist it would fail open in that case. The specification is basically "the input format for libseccomp" so if you want to specify the behaviour differently you might consider first writing a new seccomp library for runc, a set of tests, and a new specification format as the libseccomp format is a subset of what the underlying code can do... |
I have no problem with leaning on libseccomp. I'd just rather replace informative wording like:
with normative wording like:
and similarly for other properties. For things that are clearly documented by libseccomp (e.g. actions), we can either inline the required values (like we do now) or link to a specific version of the seccomp docs (e.g. here, not sure if they serve a HTML rendering of that anywhere...). For syscalls, the 4.4 floor is based on: libseccomp $ git grep 'based on Linux' v2.3.2 -- src/*syscall*.c
v2.3.2:src/arch-aarch64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-arm-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-mips64n32-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-ppc-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-ppc64-syscalls.c:/* NOTE: based on Linux 4.10-rc6+ */
v2.3.2:src/arch-s390-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-s390x-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x32-syscalls.c:/* NOTE: based on Linux 4.5-rc4 */
v2.3.2:src/arch-x86-syscalls.c:/* NOTE: based on Linux 4.9 */
v2.3.2:src/arch-x86_64-syscalls.c:/* NOTE: based on Linux 4.9 */ x32 looks like the only thing blocking a bump to 4.9. |
Agreed, and that's why complaining to your runtime developer will be pointless (with good reason). But at least it should be mentioned in the spec that syscalls might be silently dropped. In my initial example, the containerized application is losing features compared to an bare-metal application. It could also affect performance. Initially, I thought that failing to whitelist a syscall should not be fatal, but failing to block a syscall should be fatal. But maybe the syscall actually doesn't exist on the host kernel, and there is no easy way to differentiate those cases.
This seems fine if we also add some wording about dropping unsupported syscalls. |
I think runtimes should always fail create in those cases, since they can't apply the configured settings (arguably already covered by this spec language). It would be up to the caller to figure out if they were ok skipping that syscall (in which case they'd remove the entry from their config and try again) or not (in which case they'd try to get a runtime built against a newer version of libseccomp or otherwise). |
I'm most in favour of this -- I don't like having a minimum kernel version or So as above, I think failing create with an informative error message would be the best way of handling this. |
The specification doesn't specify anything if a syscall can't be resolved (or if it's a "pseudo" syscall).
runc silently drop those entries:
https://github.com/opencontainers/runc/blob/ecd55a4135e0a26de884ce436442914f945b1e76/libcontainer/seccomp/seccomp_linux.go#L168-L173
This seems like a fairly strong assumption to make, since for runc it ultimately depends on the version of libseccomp you have. On ubuntu 18.04, I have
libseccomp2=2.3.1-2.1ubuntu4
, which doesn't seem to include the patch from @justincormack:seccomp/libseccomp@d9102f1
For instance, the seccomp profile used by Docker is supposed to whitelist
preadv2
:https://github.com/moby/moby/blob/master/profiles/seccomp/default.json#L226
But since my libseccomp is missing the patch, it won't work:
If I remove seccomp with
--security-opt seccomp=unconfined
, it "works" as expected:preadv2
is obviously a toy example, but this would be a surprising behavior ifdefaultAction == SCMP_ACT_ALLOW
and you want to blacklist a syscall that libseccomp doesn't know about, the syscall would be silently allowed as far as I can see.The text was updated successfully, but these errors were encountered: