Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP for promoting seccomp to GA #1148

Merged
merged 10 commits into from
May 6, 2020
Next Next commit
KEP for promoting seccomp to GA
  • Loading branch information
tallclair committed Jul 17, 2019
commit 0e21d2520f72f120a94f9cd07523371010cd6c00
350 changes: 350 additions & 0 deletions keps/sig-node/20190717-seccomp-ga.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,350 @@
---
title: Seccomp to GA
authors:
- "@tallclair"
owning-sig: sig-node
participating-sigs:
- sig-apimachinery
- sig-auth
reviewers:
- "@liggitt"
- TBD
approvers:
- TBD
editor: TBD
creation-date: 2019-07-17
status: provisional
---

# Seccomp to GA

## Table of Contents

<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [API](#api)
- [Pod API](#pod-api)
- [PodSecurityPolicy API](#podsecuritypolicy-api)
- [Design Details](#design-details)
- [Version Skew Strategy](#version-skew-strategy)
- [Pod Creation](#pod-creation)
- [Pod Update](#pod-update)
- [PodSecurityPolicy Creation](#podsecuritypolicy-creation)
- [PodSecurityPolicy Update](#podsecuritypolicy-update)
- [PodTemplates](#podtemplates)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [References](#references)
<!-- /toc -->

## Release Signoff Checklist

**ACTION REQUIRED:** In order to merge code into a release, there must be an issue in [kubernetes/enhancements] referencing this KEP and targeting a release milestone **before [Enhancement Freeze](https://github.com/kubernetes/sig-release/tree/master/releases)
of the targeted release**.

For enhancements that make changes to code or processes/procedures in core Kubernetes i.e., [kubernetes/kubernetes], we require the following Release Signoff checklist to be completed.

Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.

- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
- [ ] KEP approvers have set the KEP status to `implementable`
- [ ] Design details are appropriately documented
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] Graduation criteria is in place
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

**Note:** Any PRs to move a KEP to `implementable` or significant changes once it is marked `implementable` should be approved by each of the KEP approvers. If any of those approvers is no longer appropriate than changes to that list should be approved by the remaining approvers and/or the owning SIG (or SIG-arch for cross cutting KEPs).

**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.

[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://github.com/kubernetes/enhancements/issues
[kubernetes/kubernetes]: https://github.com/kubernetes/kubernetes
[kubernetes/website]: https://github.com/kubernetes/website

## Summary

This is a proposal to upgrade the seccomp annotation on pods & pod security policies to a field, and
mark the feature as GA. This proposal aims to do the _bare minimum_ to clean up the feature, without
blocking future enhancements.

## Motivation

Docker started enforcing a default seccomp profile in v1.10. At the time, Kubernetes (in v1.2)
didn't have a way to control the seccomp profile, so the profile was disabled (set to `unconfined`)
to prevent a regression (see https://github.com/kubernetes/kubernetes/pull/21790). In Kubernetes
v1.3, annotations were added to give users some control over the profile:

```
seccomp.security.alpha.kubernetes.io/pod: {unconfined,docker/default,localhost/<path>}
container.seccomp.security.alpha.kubernetes.io/<container_name>: ...
```

The feature has been more or less unchanged ever since. Also note that the addition predates feature
gates or our modern concept of feature lifecycle. So, even though the annotations include `alpha` in
the key, this is entirely useable on any production GA cluster.

There have been multiple attempts to [change the default
profile](https://github.com/kubernetes/kubernetes/issues/39845) or [formally spec the Kubernetes
seccomp profile](https://github.com/kubernetes/kubernetes/issues/39128), but both efforts were
abandoned due to friction and lack of investment.

Despite the `alpha` label, I think this feature needs to be treated as GA, and we're doing our users
a disservice by leaving it in this weird limbo state. As much as I would like to see seccomp support
fully fleshed out, if we block GA on those enhancements we will remain stuck in the current state
indefinitely. Therefore, I'm proposing we do the absolute minimum to clean up the current
implementation all accurately declare the feature "GA". Future enhancements can follow the standard
alpha -> beta -> GA feature process.

_NOTE: AppArmor is in a very similar state, but with some subtle differences. Promoting AppArmor to
GA will be covered by a separate KEP._

### Goals

- Declare seccomp GA
- Fully document and formally spec the feature support
- Migrate the annotations to standard API fields
Copy link
Member

@liggitt liggitt Jul 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest using language like "add equivalent API fields" rather than "migrate" (which sounds like we will move data, which we know doesn't work well)

edit: if the migration only occurs in pod objects where the fields are immutable after creation, this can work... the description below should be explicit that no data movement is done in pod template objects embedded in other types

edit 2: you were already explicit about pod templates... nevermind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the pod fields are immutable, is it possible to still set them on update as part of the storage flow? The criteria would be:

  1. The seccomp annotations are set, and do not change as part of the update.
  2. The seccomp fields are not set.

I guess the first criteria is problematic, since the seccomp fields could be changed in the update, and then a subsequent update could leave them be, so maybe that doesn't matter.

- Deprecate the seccomp annotations

### Non-Goals

This KEP proposes the absolute minimum to get seccomp to GA, therefore all functional enhancements
are out of scope, including:

- Changing the default seccomp profile from `unconfined`
- Defining any standard "Kubernetes branded" seccomp profiles
- Formally speccing the seccomp profile format in Kubernetes
- Providing mechanisms for loading profiles from outside the static seccomp node directory
- Changing the semantics around seccomp support

tallclair marked this conversation as resolved.
Show resolved Hide resolved
## Proposal

### API

The seccomp API will be functionally equivalent to the current alpha API. This includes the Pod API,
which specifies what profile the pod & containers run with, and the PodSecurityPolicy API which
specifies allowed profiles & a default profile.

#### Pod API

The Pod Seccomp API is immutable.
tallclair marked this conversation as resolved.
Show resolved Hide resolved

```go
type PodSecurityContext struct {
...
// The seccomp options to use by the containers in this pod.
// +optional
Seccomp *SeccompOptions
...
}

type SecurityContext struct {
...
// The seccomp options to use by this container. If seccomp options are
// provided at both the pod & container level, the container options
// override the pod options.
// +optional
Seccomp *SeccompOptions
...
}

type SeccompOptions struct {
// The seccomp profile to run with.
SeccompProfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this intended to be embedded rather than something like Profile SeccompProfile? what other options do you anticipate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I modeled it off the Volume type.

The only option that comes to mind is the fallback strategy. Currently, if the seccomp options can't be enforced the node silently ignores them. In contrast, when AppArmor can't be enforced the pod is held in a Blocked state.

From a security perspective, silently failing could leave a potential security hole. From a portability perspective, it's nice to be able to choose the best practices options with a best-effort approach. Hence it might make sense to give the user the choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens for nodes/runtimes that don't enable seccomp is worth documenting, at least

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just securityContext.seccompProfile instead of embedding/nesting (can prefix other seccomp fields with seccomp if they are needed in the future

}

// Only one profile source may be set.
// +union
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @apelisse ... don't we want to add a discriminator here? what's the current approach to doing that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are discriminators required if the structure is immutable? I guess we'd still need it for pod templates...

Copy link
Member Author

@tallclair tallclair Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the discriminator field need to be a 1:1 mapping with union fields? If not, the API could look like this:

type SeccompProfile struct {
  // +unionDiscriminator
  // +optional
  Type *SeccompProfileType
  // +optional
  LocalhostProfile *string
}

type SeccompProfileType string

const (
  SeccompProfileUnconfined SeccompProfileType = "Unconfined"
  SeccompProfileRuntimeDefault SeccompProfileType = "RuntimeDefault"
  SeccompProfileLocalhost SeccompProfileType = "Localhost"
)

In other words, it seems redundant to have a the type indicate a boolean value that must be set to true. In those cases, the discriminator should be sufficient to indicate the option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the discriminator field need to be a 1:1 mapping with union fields?

No, a union type without an associated member field is permitted

type SeccompProfile struct {
// No seccomp profile should be set.
// +optional
Unconfined *bool
tallclair marked this conversation as resolved.
Show resolved Hide resolved
// Use a predefined profile defined by the runtime.
// Most runtimes only support "default"
// +optional
RuntimeProfile *string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify format limits (length, valid characters)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I go through this, I'm second guessing this field. AFAIK there aren't any runtimes that support multiple built in profiles, and this feature has never been requested.

If we dropped this field for now (just assume it's runtime/default), how bad would it be to add it back at some point in the future if it was needed? As long as we defaulted the new field to "default" and it's immutable, it doesn't seem like it would be that problematic? Or am I forgetting something?

// Load a profile defined in static file on the node.
// The profile must be preconfigured on the node to work.
// +optional
LocalhostProfile *string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the current annotations contain sufficient detail to know whether they should convert to a runtime or localhost profile?

Copy link
Member Author

@tallclair tallclair Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently the legal values for the annotations are:

unconfined        # the real default
docker/default    # equivalent to runtime/default
runtime/default
localhost/<path>  # where path is relative to the node's configured seccomp root

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec out how annotation -> field conversion would work in the docker/default case (and how we'd validate annotation and field matched in that case... probably a special case)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we use API defaulting to set RuntimeProfile to "default" if type is set to "Runtime"? if so, would we clear RuntimeProfile on update if Type was changed to something other than "Runtime"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the design review, we discussed not wanting to add the LocalhostProfile field if the intention is to deprecate and eventually remove that feature. Here's an alternative proposal for how to handle localhost:

Drop the LocalhostProfile *string field. Keep the SeccompProfileLocalhost SeccompProfileType, but optionally change its value to LocalhostDeprecated.

When creating a pod, the profile type can only be set to "Localhost" if the annotation is also set to localhost. When the kubelet goes to enforce the localhost profile, it fetches the path from the annotation.

The one gotcha is how to handle annotation update in this case. I'm tempted to say allow the update, and if the kubelet goes to enforce it and the annotation isn't set to localhost anymore, just treat it as an invalid localhost path (fail the pod).

}
```

This API makes the options more explicit than the stringly-typed annotation values, and leaves room
for new profile sources to be added in the future (e.g. Kubernetes predefined profiles or ConfigMap
profiles). The seccomp options struct leaves room for future extensions, such as defining the
behavior when a profile cannot be set.

#### PodSecurityPolicy API

```go
type PodSecurityPolicySpec struct {
...
// seccomp is the strategy that will dictate allowable and default seccomp
// profiles for the pod.
// +optional
Seccomp *SeccompStrategyOptions
...
}

type SeccompStrategyOptions struct {
// The default profile to set on the pod, if non is specified.
tallclair marked this conversation as resolved.
Show resolved Hide resolved
// The default MUST be allowed by the allowedProfiles.
// +optional
DefaultProfile *v1.SeccompProfile

// The set of profiles that may be set on the pod or containers.
// If unspecified, seccomp profiles are unrestricted by this policy.
liggitt marked this conversation as resolved.
Show resolved Hide resolved
// +optional
AllowedProfiles *SeccompProfileSet
}

// A set of seccomp profiles. This struct should be a plural of v1.SeccompProfile.
// All values are optional, and an unspecified field excludes all profiles of
// that type from the set.
type SeccompProfileSet struct {
// Whether the unconfined profile is included in this set.
// +optional
Unconfined *bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is false different from null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. Maybe this should just be required? Is it permissible to have an optional non-pointer bool if the false & nil are equivalent? I suppose a reason to have nil is to allow for defaulting in mutating admission

// The allowed runtimeProfiles. A value of '*' allows all runtimeProfiles.
// +optional
RuntimeProfiles []string
// The allowed localhostProfiles. Values may end in '*' to include all
// localhostProfiles with a prefix.
// +optional
LocalhostProfiles []string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use Config object for storing profiles across the nodes? Can be another KEP though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree this would be useful, but implementing it is a big project that I don't want to block GA on. See non-goals:

  • Providing mechanisms for loading profiles from outside the static seccomp node directory

}
```

## Design Details

### Version Skew Strategy

Because the API is currently represented as (mutable) annotations, care must be taken for migrating
to the API fields. The cases to consider are: pod create, pod update, PSP create, PSP update.

All API skew is resolved in the API server. New Kubelets will only use the seccomp values specified
in the fields, and ignore the annotations.

The PodSecurityPolicy admission controller **must continue to check for annotations** if the fields
are unset, in order to handle upgrade & version skew scenarios.

#### Pod Creation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expected behavior if the CRI runtime doesn't support seccomp? The pod should be created without seccomp (as in the current alpha annotation) or should result in an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to maintain the backwards compatible behavior of ignoring it (I'll add a line to be explicit about this). Going forward, I left space in the API for a FailureStrategy which could dictate how to handle that situation (e.g. some applications may depend on seccomp application to run securely, while others might accept a best-effort best-practices approach).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No application should depend on a policy; a policy can be applied as a non privileged operation by any code, so if it requires it then it can just apply it.


If no seccomp annotations or fields are specified, no action is necessary.

If _only_ seccomp fields are specified, add the corresponding annotations. This ensures that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the proposed field is more expressive than the annotation (allows runtime profiles other than "default")... need to resolve if/how those would be converted to annotations that would not cause 1.17-level API servers to choke in validation

fields are enforced even if the node version trails the API version. Since [we
support](https://kubernetes.io/docs/setup/release/version-skew-policy/) up to 2 minor releases of
version skew between the master and node, this behavior will be removed in version N+2 (where N is
the release with the upgraded seccomp support).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this to say "when the oldest supported kubelet is 1.18", since we're thinking through extending support to 4 releases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we also considering extending version-skew support to 4 releases? Or just that we would carry patches for 4 releases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're considering the latter. the current kubelet/apiserver skew was set so the oldest kubelet would work with the newest apiserver, to avoid needing to do multiple nodepool recreations in a single cluster upgrade cycle

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should just keep this for at least 4 versions. We don't need to rush to remove the reconciliation.


If _only_ seccomp annotations are specified, copy the values into the corresponding fields. This
ensures that existing applications continue to enforce seccomp, and prevents the kubelet from
needing to resolve annotations & fields.

If both seccomp annotations _and_ fields are specified, the values MUST match. This will be enforced
in API validation.

#### Pod Update
tallclair marked this conversation as resolved.
Show resolved Hide resolved

The seccomp fields on a pod are immutable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not specific to this field, but we need to consider the case of an update from an old client that drops the new spec field. I think for this (and for other immutable fields), we should probably preserve the existing field value if an update attempts to clear it (since that could be coming from a client unaware it needs to round-trip this new field)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed kubernetes/kubernetes#87301 to track this. It's not clear to me what the right way of handling this is. One argument for special-casing it for seccomp is that the field may be set by something the client does know about, the annotation. So there could be a situation where:

  1. Client creates pod with seccomp annotation
  2. Seccomp attempts to update the pod
  3. Update is rejected for dropping the seccomp field


The behavior on annotation update is currently ill-defined: the annotation update is allowed, but
the new value will not be used until the container is restarted. There is no way to tell (from the
API) what value a container is using.

Therefore, seccomp annotation updates will be ignored. This maintains backwards API compatibility
(no tightening validation), and makes a small stabilizing change to behavior (new Kubelets will
ignore the update).

#### PodSecurityPolicy Creation

_Similar policy to [Pod Creation](#pod-creation)_
tallclair marked this conversation as resolved.
Show resolved Hide resolved

If no seccomp annotations or fields are specified, no action is necessary.

If _only_ seccomp fields are specified, add the corresponding annotations. This ensures that the
fields are enforced even in skewed high-availability environments. Since [we
support](https://kubernetes.io/docs/setup/release/version-skew-policy/) up to 1 minor release of
version skew between the HA masters, this behavior will be removed in version N+1 (where N is the
release with the upgraded seccomp support).

If _only_ seccomp annotations are specified, copy the values into the corresponding fields. This
smooths the migration for a future release that removes annotation support, and frees tools from
needing to check annotations.

If both seccomp annotations _and_ fields are specified, the values MUST match. This will be enforced
in API validation.

#### PodSecurityPolicy Update

If the old pod did not specify any seccomp options (fields or annotations), follow the same rules as
for creation.

If only the fields or only the annotations are changed, take the new value and copy it to both the
corresponding fields & annotations (even if one was previously unset).

If both the fields _and_ annotations are changed, the new values MUST match. Enforced in API
validation.

#### PodTemplates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider how cluster operators will know they have things that need updating. a metric that fires on any podtemplate object with seccomp annotations, and (once we drop pod field -> annotation defaulting) any pod with a seccomp annotation might be helpful for tracking this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. It could be any combination of:

  • Metric counter
  • Log line
  • Event (on the controller)
  • Warning annotation (on the controller or created pod)

A metric seems too hidden, if you don't know to look for it. I like the event, except it will disappear if you don't notice it right away. I'm leaning towards an annotation on both the controller object and the pod:

warning.kubernetes.io/seccomp: "Seccomp set through annotations. Support for setting seccomp through annotations will be dropped in v1.X"

We may also want to consider forbidding setting the seccomp annotation after we drop support, so that pods aren't being run unconfined unexpectedly. That would be more likely to break users, but follows the secure failure principle.


PodTemplates (e.g. ReplaceSets, Deployments, StatefulSets, etc.) will be ignored. The
field/annotation resolution will happen on template instantiation.

### Test Plan

Seccomp already has [E2E tests][], but the tests are guarded by the `[Feature:Seccomp]` tag and not
run in the standard test suites.

Prior to being marked GA, the feature tag will be removed from the seccomp tests, and the tests will
be migrated to the new fields API.
tallclair marked this conversation as resolved.
Show resolved Hide resolved

New tests will be added covering the annotation/field conflict cases described under
[Version Skew Strategy](#version-skew-strategy).

Test coverage for localhost profiles will be added, unless we decide to [keep localhost support in
alpha](#alternatives).

[E2E tests]: https://github.com/kubernetes/kubernetes/blob/5db091dde4d7de74283ca94870958acf63010c0a/test/e2e/node/security_context.go#L147

### Graduation Criteria

_This section is excluded, as it is the subject of the entire proposal._

### Upgrade / Downgrade Strategy

See [Version Skew Strategy](#version-skew-strategy).

## Implementation History

- 2019-07-17: Initial KEP

## Drawbacks

Promoting seccomp as-is to GA may be seen as "blessing" the current functionality, and make it
harder to make some of the enhancements listed under [Non-Goals](#non-goals). Since the current
behavior is unguarded, I think we already need to treat the behavior as GA (which is why it's been
so hard to change the default profile), so I do not think these changes will actually increase the
friction.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my biggest concern, but I think the arguments you've laid out are correct.
This is a necessary step to get to the good stuff :)

Thanks so much Tim!


## Alternatives

The localhost feature currently depends on an alpha Kubelet flag. We could therefore label the
localhostProfile source as an alpha field, and keep it's functionality in an alpha state.

## References

- [Original seccomp proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/seccomp.md)