Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for seccomp #20870

Closed
vishh opened this issue Feb 9, 2016 · 51 comments
Closed

Add support for seccomp #20870

vishh opened this issue Feb 9, 2016 · 51 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@vishh
Copy link
Contributor

vishh commented Feb 9, 2016

Starting from v1.10 docker daemon will run with a restrictive seccomp profile turned on by default.

Users will have to include a custom seccomp profile in addition to capabilities.

Until we figure out a sane way to add support for seccomp, I propose disabling (unconfined) seccomp profiles by default.
This will allow users to upgrade to docker v1.10 seamlessly.

cc @kubernetes/goog-node @bgrant0607

@vishh vishh added sig/node Categorizes an issue or PR as relevant to SIG Node. team/api priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 9, 2016
@vishh
Copy link
Contributor Author

vishh commented Feb 9, 2016

cc @erictune @kubernetes/rh-platform-management

@bgrant0607
Copy link
Member

cc @pmorie @pweil-

@bgrant0607 bgrant0607 added this to the next-candidate milestone Feb 9, 2016
@ncdc
Copy link
Member

ncdc commented Feb 9, 2016

cc @kubernetes/rh-cluster-infra

@rhatdan
Copy link

rhatdan commented Feb 24, 2016

Is the problem here that you are worried about seccomp breaking certain actions? I hate to turn off a key security feature of docker for the masses of users of k8s.

@vishh
Copy link
Contributor Author

vishh commented Feb 24, 2016

@rhatdan: moby/moby#20082

@erictune
Copy link
Member

Do these make sense as requirements for kubernetes support for seccomp filters:

  1. we, the project, pre-defines names for certain "canned" seccomp filters with sensible defaults
    • probably the same as Docker's 1.10 default, and unrestricted.
  2. cluster admins can define their own custom filters too.
  3. it should be possible to define a seccomp filter and a list of allowed capabilities in the same policy object, so that both are set on a pod/container at once, to avoid user confusion about the need to set both.
  4. users can select the policy that a pod/container uses by giving the short name of a policy object, rather than inlining the entire seccomp filter text into the pod spec.
  5. admins can change the default policy for pods that don't say what to do as regards seccomp.
  6. the default default, on a new/upgraded cluster does not change, without some affirmative user action.
  7. if a user writes a pod spec that asks for a level of security that is not supported by the node (e.g. they ask for a restrictive seccomp profile, and the node is running docker 1.9) then pod creation fails, rather then the user silently getting no security.

@erictune
Copy link
Member

@kubernetes/kube-iam

@erictune
Copy link
Member

@pmorie

@eparis
Copy link
Contributor

eparis commented Mar 29, 2016

@erictune my thoughts are here (although disorganized as usual) https://trello.com/c/1pn9IN3m/320-better-seccomp-support But I think you nailed it.

@erictune
Copy link
Member

I didn't realize that only increasingly more restrictive seccomp profiles can be loaded, if I am reading the comments on https://trello.com/c/1pn9IN3m/320-better-seccomp-support correctly. That makes it much less flexible that I had imagined.

@eparis
Copy link
Contributor

eparis commented Mar 29, 2016

docker only lets us load a single profile, so that part isn't relevant today (I think RH is doing OCI work to make it a list of profiles but who knows if that ever gets plumbed up into docker)

If you load 2 profiles they will be executed in order and all of the profiles must 'approve' the syscall. If profile 1 says
deny mount; allow all
and profile 2 says
deny module_init; allow mount; allow all

Then BOTH mount and module_init will get denied.

@eparis
Copy link
Contributor

eparis commented Mar 29, 2016

So the profiles themselves do not have to be of increasing restrictivity (restrictivity is a word, right?) but the combination will always be at least as restrictive as either parent.

@jessfraz
Copy link
Contributor

jessfraz commented Apr 5, 2016

What if instead of turning it off completely kubernetes loaded a custom default, w mount and whatever else you all need

@jessfraz
Copy link
Contributor

jessfraz commented Apr 5, 2016

and if the user loads one it would override your default, just like how it works in docker.

@jessfraz
Copy link
Contributor

jessfraz commented Apr 5, 2016

I'd be happy to contribute the patch depending on what syscalls you all need added

@vishh
Copy link
Contributor Author

vishh commented Apr 5, 2016

If users have a mechanism to override the default profile, I guess docker's default profile should itself work.

@jessfraz
Copy link
Contributor

jessfraz commented Apr 5, 2016

So is the problem exposing the custom profile in the kubernetes API in a
time constraint? Just want to understand so I can help

On Monday, April 4, 2016, Vish Kannan [email protected] wrote:

If users have a mechanism to override the default profile, I guess
docker's default profile should itself work.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http:https://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@justincormack
Copy link
Contributor

@erictune are you explicitly referring to Linux capabilities there? One option would be to tie the default Docker seccomp filter to the allowed capabilities so that adding say CAP_SYS_ADMIN allows some more syscalls, while --cap-drop-all removes a few more potentially. In particular adding CAP_SYS_ADMIN would allow unshare and mount, allowing systemd to run.

Getting admins to write seccomp filters is somewhat unpleasant work, the aim with the Docker seccomp support has been to avoid it as much as possible.

@thockin
Copy link
Member

thockin commented Apr 5, 2016

Is this manifesting as a real problem right now? I am wild-guessing that it shows up as containers that fail (syscall blocked) with no remediation (since it didn't make it into our API) ?

It seems like an overly large thing to cherry pick, and yet not having it seems bad. Can we maybe have an annotation that we pass through to docker, and add proper profiles and fields in 1.3?

@rhatdan
Copy link

rhatdan commented Apr 5, 2016

I tried to add this pull request to docker, but it was closed because docker wants the tighter security

projectatomic/docker#80

I believe that we should attempt to separate different security mechanisms from each other. For example. SELinux can block Capabilities, but I allow all Caps from an SELinux point of view and rely on the Capabilities Subsystem to manage caps, Seccomp did not follow this standard, at least to a few syscalls that are controlled by SYS_ADMIN. We saw these when trying to run systemd in a container.
We are working on patches to allow the addition of syscalls to the white list as well as changing the default way that seccomp is enforced, so that users would be allowed to specify KILL versus ERRNO. KILL Is the only seccomp action that outputs blocked syscalls to /var/log/audit/audit.log, which could give the user a clue on why he is getting permission denied.

Allowing tools to specify Whitelist or blacklist syscalls would help. It would also help if the syscall table that docker is using was provided externally to the tool, not compiled in. Then the admin or tools could just modify the default json and pass it to docker daemon on on the docker run command to specify the syscalls to be used by the container(s).

@jessfraz
Copy link
Contributor

jessfraz commented Apr 5, 2016

If the default profile is the problem it could easily be fixed by my
suggestion of kubes passing its own w whatever other syscalls they need
added to the whitelist, fwict the problem is more time adding it to the API
unless I am mistaken

On Tuesday, April 5, 2016, Daniel J Walsh [email protected] wrote:

I tried to add this pull request to docker, but it was closed because
docker wants the tighter security

projectatomic/docker#80 projectatomic/docker#80

I believe that we should attempt to separate different security mechanisms
from each other. For example. SELinux can block Capabilities, but I allow
all Caps from an SELinux point of view and rely on the Capabilities
Subsystem to manage caps, Seccomp did not follow this standard, at least to
a few syscalls that are controlled by SYS_ADMIN. We saw these when trying
to run systemd in a container.

We are working on patches to allow the addition of syscalls to the white
list as well as changing the default way that seccomp is enforced, so that
users would be allowed to specify KILL versus ERRNO. KILL Is the only
seccomp action that outputs blocked syscalls to /var/log/audit/audit.log,
which could give the user a clue on why he is getting permission denied.

Allowing tools to specify Whitelist or blacklist syscalls would help. It
would also help if the syscall table that docker is using was provided
externally to the tool, not compiled in. Then the admin or tools could just
modify the default json and pass it to docker daemon on on the docker run
command to specify the syscalls to be used by the container(s).


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http:https://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@eparis
Copy link
Contributor

eparis commented Apr 5, 2016

In 1.2 we explicitly disable the seccomp profiles (aka we disable the docker default). So the 'manifestation' today is that kubernetes users cannot make use of this security feature provided by the underlying container backend.

The default docker seccomp profile would be great 99% of the time (the exact same % of the time that it would be great for docker). But until we plumb through some way to control the seccomp profile there wasn't a better alternative. This issue requires someone to plumb through the kube API a way to represent seccomp profiles and how to disable them.

I think there are a lot of ideas above how to plumb seccomp through the API, some form of which needs to be done before 1.3...

@bgrant0607
Copy link
Member

We can't break existing applications running in existing clusters.

Users also expect to be able to delete and re-create their clusters across releases with identical behavior, and we have the official policy that we don't change default behavior of APIs without introducing a new API version.

@erictune
Copy link
Member

erictune commented Apr 6, 2016

@jfrazelle we don't know what calls our users might be doing, so we don't want to change the default and break them.

I see that before 1.10 you tested all the official docker images and didn't see any issues. Since then, have there been any user reports of images not working after upgrading to 1.10 due to seccomp?

@thockin
Copy link
Member

thockin commented Apr 20, 2016

You could argue it was an oversight, but it's a stretch. I was hoping
someone would convince me, but ...

On Wed, Apr 20, 2016 at 8:15 AM, Jess Frazelle [email protected]
wrote:

I think it would be a good idea to not include it in say a .1 or patch
release because it is kinda changing behavior and is better for a .0, if
feel bad breaking semver

On Wednesday, April 20, 2016, Andy Goldstein [email protected]
wrote:

I think we're probably going to have to defer a real solution until 1.3.
I
know @pmorie https://github.com/pmorie is planning on writing up a
proposal.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu <
http:https://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3>


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

@jessfraz
Copy link
Contributor

don't worry I'll be the pusher for 1.3 😇

@thockin
Copy link
Member

thockin commented Apr 21, 2016

I'mma hold you to that.

On Thu, Apr 21, 2016 at 10:13 AM, Jess Frazelle [email protected]
wrote:

don't worry I'll be the pusher for 1.3 [image: 😇]


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

@pmorie pmorie mentioned this issue Apr 21, 2016
2 tasks
@pmorie
Copy link
Member

pmorie commented Apr 26, 2016

@jfrazelle

don't worry I'll be the pusher for 1.3

👍

k8s-github-robot pushed a commit that referenced this issue May 23, 2016
Automatic merge from submit-queue

Seccomp Proposal

WIP proposal to address #20870 

@kubernetes/kube-api 
@kubernetes/sig-node

<!-- Reviewable:start -->
---
This change is [<img src="http:https://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http:https://reviewable.k8s.io/reviews/kubernetes/kubernetes/24602)
<!-- Reviewable:end -->
@dims
Copy link
Member

dims commented Jun 8, 2016

@thockin @vishh We can close this now (Since PR has merged aa8c72a)

@ncdc
Copy link
Member

ncdc commented Jun 8, 2016

Not yet - we need #26710 to merge first

@dchen1107 dchen1107 modified the milestones: v1.3, next-candidate Jun 8, 2016
@dchen1107
Copy link
Member

Moved this to v1.3 for the proper feature tracking.

@goltermann goltermann modified the milestones: v1.3, next-candidate Jun 13, 2016
xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016
Automatic merge from submit-queue

Seccomp Proposal

WIP proposal to address kubernetes#20870 

@kubernetes/kube-api 
@kubernetes/sig-node

<!-- Reviewable:start -->
---
This change is [<img src="http:https://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http:https://reviewable.k8s.io/reviews/kubernetes/kubernetes/24602)
<!-- Reviewable:end -->
@bgrant0607
Copy link
Member

Feature issue: kubernetes/enhancements#135

@tallclair
Copy link
Member

Closing in favor of kubernetes/enhancements#135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests