Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: base GID must be present in the supplementary GIDs array #1168

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

neersighted
Copy link

@neersighted neersighted commented Nov 2, 2022

Currently, the spec is unclear whether the list of supplementary GIDs used to create a container process should include the 'base' GID implicitly, or whether the config needs to specify this explicitly if desired.

While per POSIX it is permissible for a system to include or exclude the base GID from the list of supplementary GIDs, in all Runtime Spec platforms the base GID is always added, and omitting it can have real security consequences as fully dropping a group is not typically allowed in Unix.

This recently led to a number of CVEs in OCI Runtime Spec implementations, as it was concluded that it is necessary for a Unix container to always include the base GID in the list of supplementary GIDs, as originally established by 4.4BSD.

Some of the CVEs include:

Some examples of how existing implementations handle this:

  • util-linux calls initgroups(3) with the user's primary GID.
  • shadowutils (Linux) calls initgroups(3) with the user's primary GID.
  • FreeBSD calls initgroups(3) with the user's GID from the password file (aka the primary GID).
  • Solaris calls initgroups(3) with the user's primary GID.
  • Z/OS's session creation code is not available; however initgroups(3) specifies a convention of including including the real group ID from the user database (aka the primary GID).
  • OpenSSH calls initgroups(3) with the user's primary GID; on all of the above platforms this will have the same result as a login(1), including the primary GID in the list of supplementary GIDs.

While login(1) has generally been used as the example above, the same holds true for su(1) and other methods of starting a new session (including OpenSSH, as explained above).

Given this seems clearly desirable and the OCI runtime is effectively equivalent of login(1)/su(1)/any other program that sets up a new session, the OCI runtime is the best place to ensure that the list of supplementary group IDs contains the base GID.

@neersighted
Copy link
Author

cc @sjmurdoch @haircommander @p-rog

config.md Outdated Show resolved Hide resolved
config.md Outdated Show resolved Hide resolved
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but please squash commits (EDIT: Let me withdraw my LGTM, for compatibility sake)

@neersighted
Copy link
Author

The intention is to make review easier by not destroying review history (as Github does not have a concept of patch sets/series like Gerrit or git format-patch); approving reviews will not be dismissed when the commit series is squashed down (and the intended rebase can be seen by the fixup messages).

I'll likely wait until more maintainers have had time to review to squash, unless you think this is ready for merge now, in which case I can perform the rebase.

@p-rog
Copy link

p-rog commented Nov 3, 2022

thank you @neersighted for opening this PR

@giuseppe
Copy link
Member

giuseppe commented Nov 3, 2022

I am not fully convinced about this change. It is a breaking change and there won't be a way to keep the existing behavior even if desired.

Personally, I still believe it is the caller's responsibility to ensure the gid is part of the additional gids.

@neersighted
Copy link
Author

For what its worth, I don't think that this precludes something like #1020. There is also the potential of adding a new field so we can have a backwards compatible behavior; however I do strongly feel that patching the existing spec is best as there does not seem to be any case where it is desirable to have the primary GID set but not in the supplementary GIDs.

@neersighted
Copy link
Author

I've pushed up a squashed version as it doesn't seem too disruptive; would still like more eyes and opinions on this. I still strongly feel that this is the best way to address the mismatch between 'real' Linux systems and OCI containers, as otherwise some implementations will always be deficient. Note that entities that use an OCI runtime SHOULD still specify the primary group ID explicitly, for older implementations.

@giuseppe
Copy link
Member

giuseppe commented Nov 9, 2022

@cyphar what do you think about this change?

config.md Outdated Show resolved Hide resolved
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neersighted can you squash the commits? Otherwise LGTM

Currently, the spec is unclear whether the list of [supplementary GIDs][POSIX-sgids-def]
used to create a container process should include the 'base' GID
implicitly, or whether the config needs to specify this explicitly if
desired.

While [per POSIX][POSIX-sgids-rat] it is permissible for a system to
include or exclude the base GID from the list of supplementary GIDs, in
all Runtime Spec platforms the base GID is always added, and omitting it
can have [real security consequences][benthams-gaze] as fully dropping a
group is not typically allowed in Unix.

This recently led to a number of CVEs in OCI Runtime Spec
implementations, as it was concluded that it is necessary for a Unix
container to always include the base GID in the list of supplementary
GIDs, as originally established by 4.4BSD.

Some of the CVEs include:
* [Podman (CVE-2022-2989)][CVE-2022-2989]
* [Moby (CVE-2022-36109)][CVE-2022-36109]
* [Buildah (CVE-2022-2990)][CVE-2022-2990]
* [CRI-O (CVE-2022-2995)][CVE-2022-2995]

Some examples of how existing implementations handle this:
* util-linux [calls][util-linux] [initgroups(3)][initgroups.3-linux]
  with the user's primary GID.
* shadowutils (Linux) [calls][shadowutils]
  [initgroups(3)][initgroups.3-linux] with the user's primary GID.
* FreeBSD [calls][freebsd-setusercontext]
  [initgroups(3)][initgroups.3-freebsd] with the user's GID from the
  password file (aka the primary GID).
* Solaris [calls][solaris-setup_credentials]
  [initgroups(3)][initgroups.3-solaris] with the user's primary GID.
* Z/OS's session creation code is not available; however
  [initgroups(3)][initgroups.3-zos] specifies a convention of including
  including the real group ID from the user database (aka the primary
  GID).
* OpenSSH [calls][openssh] initgroups(3) with the user's primary GID; on
  all of the above platforms this will have the same result as a
  login(1), including the primary GID in the list of supplementary GIDs.

While login(1) has generally been used as the example above, the same
holds true for su(1) and other methods of starting a new session
(including OpenSSH, as explained above).

Given this seems clearly desirable and the OCI runtime is effectively
the equivalent of login(1)/su(1)/any other program that sets up a new
session, the OCI runtime is the best place to ensure that the list of
supplementary group IDs contains the base GID.

[POSIX-sgids-def]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_378
[POSIX-sgids-rat]: https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap03.html#tag_21_03_00_73

[CVE-2022-2989]: https://access.redhat.com/security/cve/cve-2022-2989
[CVE-2022-36109]: GHSA-rc4r-wh2q-q6c4
[CVE-2022-2990]: https://access.redhat.com/security/cve/cve-2022-2990
[CVE-2022-2995]: https://access.redhat.com/security/cve/cve-2022-2995
[benthams-gaze]: https://www.benthamsgaze.org/2022/08/22/vulnerability-in-linux-containers-investigation-and-mitigation/

[util-linux]: https://github.com/util-linux/util-linux/blob/96ccdc00e1fcf1684f9734a189baf90e00ff0c9a/login-utils/login.c#L1443
[shadowutils]: https://github.com/shadow-maint/shadow/blob/eaebea55a495a56317ed85e959b3599f73c6bdf2/libmisc/setugid.c#L55
[freebsd-setusercontext]: https://github.com/freebsd/freebsd-src/blob/eeaf9d562fe137e0c52b8c346742dccfc8bde015/lib/libutil/login_class.c#L486
[solaris-setup_credentials]: https://github.com/illumos/illumos-gate/blob/d9c3e05c2d8261e3f133b5e96a300b4fa6c0f1b7/usr/src/cmd/login/login.c#L1926
[openssh]: https://github.com/openssh/openssh-portable/blob/25bd659cc72268f2858c5415740c442ee950049f/session.c#L1379
[initgroups.3-linux]: https://man7.org/linux/man-pages/man3/initgroups.3.html
[initgroups.3-freebsd]: https://www.freebsd.org/cgi/man.cgi?initgroups(3)
[initgroups.3-solaris]: https://illumos.org/man/3C/initgroups
[initgroups.3-zos]: https://www.ibm.com/docs/en/zos/2.2.0?topic=functions-initgroups-initialize-supplementary-group-id-list-process

Signed-off-by: Bjorn Neergaard <[email protected]>
@neersighted
Copy link
Author

Pushed a squashed version with no other changes.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I am still against this change as it is since there is no way to keep the current behavior. Could we at least add a new knob to maintain the existing behavior?

@AkihiroSuda
Copy link
Member

AkihiroSuda commented Dec 24, 2022

Let me withdraw my LGTM on Nov 3, for compatibility sake

@neersighted
Copy link
Author

Seems like there's a consensus that we should introduce a new field to preserve existing behavior. I'll try and work up a draft next week.

@sjmurdoch
Copy link

There was a proposal by Cory Snider in the private discussion of GHSA-rc4r-wh2q-q6c4 which I think intended to preserve backwards compatibility. I don't have any particular views on it, but it might be a useful starting point for discussion. Since it was private, I don't think I should copy and paste it here, but those who can see the issue can read it and Cory could be asked for permission to publish it.

@neersighted
Copy link
Author

neersighted commented Dec 25, 2022

Cory and I have discussed it outside of that security bulletin and iterated on the idea a bit more (we both work on the same team at Mirantis); I will be work-shopping something similar to his design with him before presenting it here 😄

@corhere
Copy link
Contributor

corhere commented Mar 28, 2023

Here is what I had proposed in the GHSA discussion linked above:

The trouble I see with pushing the responsibility to duplicate gid into the additionalGids array solely onto the container engines and other software which generates an OCI config is that there are a lot more of them than of OCI runtimes. Each and every one of them would need to get it right, on a very obscure piece of UNIX trivia. Excluding the user's gid from the supplemental groups is almost always a mistake, and with the subtle security consequences of such a mistake I think that would qualify it as a footgun. That's why I think that both the OCI runtimes and container engines should augment the supplementary groups list.

I believe that due to the vague wording of the runtime spec, OCI runtimes augmenting the supplemental groups list would be compliant with runtime-spec v1.0.2. The spec has a configuration field to specify "additional" group IDs to be added to the process on POSIX platforms, but neither the OCI runtime spec nor POSIX define what "additional group IDs" are. POSIX does, on the other hand, define "supplementary group IDs." As the OCI runtime spec does not define the mapping from a user structure to the process's supplementary group ID list, an implementation which sets it to the union of user.gid and user.additionalGids would be in compliance. I propose that the OCI runtime spec be updated roughly as follows:

  • Add a new field to the POSIX user structure: sgids (array of int, OPTIONAL). When set, the OCI runtime MUST create the process with supplementary group IDs set to exactly the set of groups listed in the user.sgids array. This field takes precedence over the additionalGids field; the OCI runtime MUST ignore the additionalGids field if sgids is provided.
  • When no user.sgids array is provided in the container config, the OCI runtime MUST create the process with supplementary group IDs set to all the GIDs listed in the user.additionalGroups config fields. The runtime MUST also add the group ID specified in user.gid to the process supplementary groups if the platform conventionally includes the user's primary group in the supplementary groups list for login users and the ID is not already listed in user.additionalGroups.
    • Including the user's primary group in the supplementary group list is conventional on Linux, *BSD, Solaris and Darwin.
    • Generators of OCI runtime configuration metadata SHOULD append the primary group ID (user.gid) to the user.additionalGroups array on host platforms where the convention is to include the user's primary group in the supplementary group list to maximize compatibility and consistency with OCI runtimes targeting older versions of the OCI runtime spec.

A spec change like this would allow for defence in depth while still permitting full control over the supplementary group list to be opted into for advanced use cases. Container engines could still add the primary group to the additionalGids array to fix the vulnerability even when used with older runtimes, and runtimes could be updated to fix the vulnerability even when used with older container engines. And if someone writes a new container engine and forgets to append the primary GID to additionalGids, they'd still be safe so long as they're using it with an up-to-date runtime.

Copy link
Author

@neersighted neersighted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a squash, and your Co-authored-by: trailer.

@corhere
Copy link
Contributor

corhere commented Mar 29, 2023

@neersighted not squashing was intentional. (Omitting the Co-authored-by trailer was not...) Since it's such a big change, I want to make it easy for reviewers to compare and contrast our drafts side by side.

Currently, the spec is unclear whether the list of supplementary GIDs[1]
used to create a container process should include the 'base' GID
implicitly, or whether the config needs to specify this explicitly if
desired.

While per POSIX[2] it is permissible for a system to include or exclude
the base GID from the list of supplementary GIDs, in all Runtime Spec
platforms the base GID is always added, and omitting it can have real
security consequences[3] as fully dropping a group is not typically
allowed in Unix.

This recently led to a number of CVEs in OCI Runtime Spec
implementations, as it was concluded that it is necessary for a Unix
container to always include the base GID in the list of supplementary
GIDs, as originally established by 4.4BSD.

Some of the CVEs include:
* Podman (CVE-2022-2989)
* Moby (CVE-2022-36109)
* Buildah (CVE-2022-2990)
* CRI-O (CVE-2022-2995)

Some examples of how existing implementations handle this:
* util-linux calls initgroups(3) with the user's primary GID. [4,5]
* shadowutils (Linux) calls initgroups(3) with the user's primary GID.
  [5,6]
* FreeBSD calls initgroups(3) with the user's GID from the password file
  (aka the primary GID). [7,8]
* Solaris calls initgroups(3) with the user's primary GID. [9,10]
* Z/OS's session creation code is not available; however initgroups(3)
  specifies a convention of including the real group ID from the user
  database (aka the primary GID). [11]
* OpenSSH[12] calls initgroups(3) with the user's primary GID; on all of
  the above platforms this will have the same result as a login(1),
  including the primary GID in the list of supplementary GIDs.

While login(1) has generally been used as the example above, the same
holds true for su(1) and other methods of starting a new session
(including OpenSSH, as explained above).

Given this seems clearly desirable and the OCI runtime is effectively
the equivalent of login(1)/su(1)/any other program that sets up a new
session, the OCI runtime is the best place to ensure that the list of
supplementary group IDs contains the base GID.

[1]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_378
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xbd_chap03.html#tag_21_03_00_73
[3]: https://www.benthamsgaze.org/2022/08/22/vulnerability-in-linux-containers-investigation-and-mitigation/
[4]: https://github.com/util-linux/util-linux/blob/96ccdc00e1fcf1684f9734a189baf90e00ff0c9a/login-utils/login.c#L1443
[5]: https://man7.org/linux/man-pages/man3/initgroups.3.html
[6]: https://github.com/shadow-maint/shadow/blob/eaebea55a495a56317ed85e959b3599f73c6bdf2/libmisc/setugid.c#L55
[7]: https://github.com/freebsd/freebsd-src/blob/eeaf9d562fe137e0c52b8c346742dccfc8bde015/lib/libutil/login_class.c#L486
[8]: https://www.freebsd.org/cgi/man.cgi?initgroups(3)
[9]: https://github.com/illumos/illumos-gate/blob/d9c3e05c2d8261e3f133b5e96a300b4fa6c0f1b7/usr/src/cmd/login/login.c#L1926
[10]: https://illumos.org/man/3C/initgroups
[11]: https://www.ibm.com/docs/en/zos/2.2.0?topic=functions-initgroups-initialize-supplementary-group-id-list-process
[12]: https://github.com/openssh/openssh-portable/blob/25bd659cc72268f2858c5415740c442ee950049f/session.c#L1379

[CVE-2022-2989]: https://access.redhat.com/security/cve/cve-2022-2989
[CVE-2022-36109]: GHSA-rc4r-wh2q-q6c4
[CVE-2022-2990]: https://access.redhat.com/security/cve/cve-2022-2990
[CVE-2022-2995]: https://access.redhat.com/security/cve/cve-2022-2995

Signed-off-by: Bjorn Neergaard <[email protected]>
Co-authored-by: Cory Snider <[email protected]>
Signed-off-by: Cory Snider <[email protected]>
@corhere corhere force-pushed the supplemental_gids_include_egid branch from efd45cc to 2d3f86c Compare March 29, 2023 20:22
@AkihiroSuda
Copy link
Member

I feel this is too complicated.
Can we just add "the gid value SHOULD be prepended to the additionalGids array"?

Runtime implementations may raise a warning when additionalGids does not contain the base gid.
Image linters, such as https://github.com/goodwithtech/dockle , may error out in that situation, depending on their policy.

@corhere
Copy link
Contributor

corhere commented Mar 30, 2023

I feel this is too complicated. Can we just add "the gid value SHOULD be prepended to the additionalGids array"?

That would still leave the generator of the container config as the single point of failure. The purpose of this proposal is to provide defense-in-depth: the bundle generator (container engine) and the runtime would both have to be out-of-date or otherwise get things wrong for a container to run with broken negative group permissions.

My proposal, described simply, is:

  • runtimes MUST prepend gid to the container process' supplemental group IDs if not already present in additionalGids
  • generators of a container config are strongly encouraged to prepend gid to additionalGids
  • config generators which require absolute control over supplemental groups for some reason have an escape hatch

The complexity of how that is specified comes from having to work within the confines of what is, and is not, defined by POSIX. Privileged operations including setgroups and initgroups are out of scope of POSIX, so I specified the runtime's behaviour in terms of getgroups in order to have the expected behaviour rigorously defined for all POSIX compliant platforms.

Runtime implementations may raise a warning when additionalGids does not contain the base gid.

Logging a warning wouldn't be of much help as the container would still be started with a broader scope of privilege than intended. And runtime-spec doesn't mandate that warnings logged by the runtime reach the user.

Image linters, such as https://github.com/goodwithtech/dockle , may error out in that situation, depending on their policy.

Image? This is runtime-spec. Images are out of scope for this discussion as there's no guarantee that a container bundle was even derived from an image.

@giuseppe
Copy link
Member

I feel this is too complicated. Can we just add "the gid value SHOULD be prepended to the additionalGids array"?

Runtime implementations may raise a warning when additionalGids does not contain the base gid.

I agree, the current design is too complicated and IMO not worth it.

I still feel this is not the OCI runtime responsibility and the security issue is anyway difficult to find in reality as it affects only images that use negative DACs and setuid binaries.

@neersighted
Copy link
Author

This is a case where the OCI runtime is creating a userspace environment that is not aligned with what any Unix has done, historically. Other programs that create a login session get this right (e.g. OpenSSH, systemd, LXC) -- there is no reason for us to do the incorrect thing and create a not-quite-Unix environment.

The question is how we get there -- if simplicity is desired, I think my original version might still be more acceptable.

@giuseppe
Copy link
Member

This is a case where the OCI runtime is creating a userspace environment that is not aligned with what any Unix has done, historically. Other programs that create a login session get this right (e.g. OpenSSH, systemd, LXC) -- there is no reason for us to do the incorrect thing and create a not-quite-Unix environment.

All these programs need to implement the same behavior, they don't expect setgroups to do it automatically, so we can expect the higher-level components to do the same, without having to introduce a breaking change that IMO is not justified by the issue here.

@corhere
Copy link
Contributor

corhere commented Mar 30, 2023

@giuseppe all those programs call initgroups when setting up a login session, not setgroups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants