[RFC] allow to skip `setgroups(2)` #1020

giuseppe · 2019-10-29T13:26:30Z

There are cases where it would be necessary to skip the setgroups(2) syscall so that the original additional groups can be maintained.

It can be used, for example, by rootless containers to keep access to a storage directory that is accessible only by a secondary group.

runc already skips the setgroups in some cases: either if the user had euid != 0 or if /proc/self/setgroups is set to deny. I'd like to add a third condition where the setgroups is skipped also if explicitly requested.

Do we need a new field under process/user, e.g. keepOriginalGroups? Would be enough to reuse additionalGids to have some special value (e.g. -1 to keep current groups)?

The text was updated successfully, but these errors were encountered:

cyphar · 2019-10-29T13:32:07Z

If we do add an option, it needs to have a really scary name (disableSetgroupSecurity or something). Not dropping supplementary groups weakens the userns security boundary, and really is something that very few people should actually want to do (not least of all because it will confuse all sorts of programs to be touching unmapped files).

In my view, the best solution to the problem of such volumes is to do exactly what LXD does -- "punch out" the GID that the storage volume is owned by (by adding a single 1:1 mapping for that GID). The most ideal solution would be the next-gen "shiftfs" work that was discussed recently, but obviously we'll have to wait for that to actually land.

rhatdan · 2019-10-29T13:41:23Z

I am skeptical, and think it could be a long wait, especially to get it upstream.

cyphar · 2019-10-29T13:45:42Z

(Also I would seriously suggest that this is functionality that should be exposed through a runtime-specific annotation and not a first-class field in config.json -- the runtime-spec already has lots of really odd features we probably shouldn't have added, and this one just rubs me the wrong way.)

giuseppe · 2019-10-29T13:49:31Z

In my view, the best solution to the problem of such volumes is to do exactly what LXD does -- "punch out" the GID that the storage volume is owned by (by adding a single 1:1 mapping for that GID). The most ideal solution would be the next-gen "shiftfs" work that was discussed recently, but obviously we'll have to wait for that to actually land.

how would it work with rootless containers or in general with IDs that are not mapped in the current namespace? I guess rootless containers won't still be able to map arbitrary IDs from the host.

tentator · 2019-12-18T12:55:04Z

Hello @cyphar ,
I also agree with Giuseppe since I have a customer that has exactly the same problem: he is wanting to use rootless podman but is currently limited by the missing subgroups to access mounted directories..
Also not sure why you think it would be a a decreased security boundary: I mean all groups the user has configured are exptected to be inherited with such an option; why limit it to the main group only?
About what kind of option should be used by runc, I'm of course open to any naming etc since that is less relevant.
What is relevant for my customer is that he will still decide to go with rootless podman depending on this feature to work or not, since otherwise he does see only the chance to run it as root which from a scurity point of view I guess it's clearly worse..

Thanks for letting me know,
Cisco.

rhatdan · 2021-05-03T15:00:28Z

This feature has become very popular in Rootless Podman, We are seeing lots of users that need access to files and devices, via supplemental groups. We have recently made this a first class feature of podman.

podman run --group-add keep-id ...

Currently this is only supported in crun, and we would love to get it to work in runc. I would hope in the future we had better support, where we could keep access to the groups as well as add groups within the user namespace, but for now this fixes a key issue rootless users are hitting. I think we see this a lot more in enterprise customers then we even see in wild.

rhatdan · 2021-05-03T15:02:00Z

@kolyshkin @mrunalp @AkihiroSuda @vrothberg FYI.

rptaylor · 2021-08-31T22:42:42Z

As an unprivileged user on a host, I have read/write access to various files, some via ownership and some via group membership.
I can mount any files I want into my container as volumes, but I can only read/write to the ones I own. The ones I access via my group memberships can only be read/written via podman with crun thanks to the --group-add option.
I don't really understand why this is only possible via the special crun flag; if I can bring files into my rootless container by mounting them as volumes shouldn't I be able to access them in the same way as outside the container?

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

paulraines68 · 2022-05-18T16:39:45Z

@rptaylor

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

I want to add as the sysadmin of a HPC batch cluster at a major biomed academic center, this secondary group issue is the primary reason we are using Singularity rather than rootless podman. We use secondary groups extensively for various users and group to work together on sensitive data sets. Containers are used to run analysis programs like Tensorflow from NVIDIA NGC or distributed docker images of apps built on (for example) Ubuntu 20 that otherwise cannot run on the RHEL7 nodes.

giuseppe · 2022-05-18T19:00:36Z

@rptaylor

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

I want to add as the sysadmin of a HPC batch cluster at a major biomed academic center, this secondary group issue is the primary reason we are using Singularity rather than rootless podman. We use secondary groups extensively for various users and group to work together on sensitive data sets. Containers are used to run analysis programs like Tensorflow from NVIDIA NGC or distributed docker images of apps built on (for example) Ubuntu 20 that otherwise cannot run on the RHEL7 nodes.

If it can be useful for you: Podman when used together with crun supports the --group-add keep-groups extension to skip setgroups in the container

paulraines68 · 2022-05-18T20:04:24Z

If it can be useful for you: Podman when used together with crun supports the --group-add keep-groups extension to skip setgroups in the container

crun is not available on RHEL7 that I can find

There is an oddness on CentOS8 Stream box

$ rpm -q podman
podman-4.0.2-1.module_el8.7.0+1106+45480ee0.x86_64
$ rpm -q runc
runc-1.0.3-3.module_el8.7.0+1106+45480ee0.x86_64
$ ls -ald /tmp/gptest
drwxrws---. 2 root sysadm 4096 May 18 15:44 /tmp/gptest
$ groups
raines httpd fsdev sysadm coregp webdev hcpdata
$ podman run -it --runtime=/usr/bin/crun --userns=keep-id --group-add=keep-groups -v /tmp/gptest:/gptest b1b6387124d9 /bin/bash
raines@806c89baacd3:/$ groups
raines nogroup
raines@806c89baacd3:/$ id
uid=5829(raines) gid=5829(raines) groups=5829(raines),65534(nogroup)
raines@806c89baacd3:/$ cd /tmp/gptest
bash: cd: /tmp/gptest: No such file or directory
raines@806c89baacd3:/$ cd /gptest
raines@806c89baacd3:/gptest$ uname -a > foobar.txt
bash: foobar.txt: Permission denied
raines@806c89baacd3:/gptest$ ls -ald .
drwxrws---. 2 nobody nogroup 4096 May 18 19:44 .

The 'nogroup' thing is wierd (singularity reports the groups normally) and I don't understand why I can cd to /gptest (read access) but not write.

giuseppe · 2022-05-18T20:20:45Z

It is not available on RHEL7. I think you need to specify the --runtime option to o podman before the run like podman --runtime=... run ...

paulraines68 · 2022-05-18T20:50:14Z

Unfortunately still not quite right:

$ echo here > /tmp/gptest/iamhere.txt
$ mkdir /tmp/gptest/subdir
$ ls -ald /tmp/gptest/subdir
drwxrwsr-x. 2 raines sysadm 4096 May 18 16:46 /tmp/gptest/subdir
$ podman --runtime=/usr/bin/crun run -it --rm --annotation=run.oci.keep_original_groups=1 --userns=keep-id --group-add=keep-groups -v /tmp/gptest:/gptest b1b6387124d9 /bin/bash
raines@6bcd8fc2304e:/$ groups
raines nogroup
raines@6bcd8fc2304e:/$ ls -ld /gptest
drwxrws---. 2 nobody nogroup 4096 May 18 19:44 /gptest
raines@6bcd8fc2304e:/$ cd /gptest
raines@6bcd8fc2304e:/gptest$ ls
ls: cannot open directory '.': Permission denied
raines@6bcd8fc2304e:/gptest$ echo foobar > foobar.txt
bash: foobar.txt: Permission denied
raines@6bcd8fc2304e:/gptest$ cat iamhere.txt
here
raines@6bcd8fc2304e:/gptest$ echo too >> iamhere.txt
raines@6bcd8fc2304e:/gptest$ cat iamhere.txt
here
too
raines@6bcd8fc2304e:/gptest$ cd subdir
raines@6bcd8fc2304e:/gptest/subdir$ ls
ls: cannot open directory '.': Permission denied
raines@6bcd8fc2304e:/gptest/subdir$ ls -ald /gptest/subdir
drwxrwsr-x. 2 raines nogroup 4096 May 18 20:46 /gptest/subdir

So actually it is 'x' bit that works for the cd, but 'r' and 'w' do not. But one can read and write to existing files in the dir. Really wierd.

giuseppe changed the title ~~[RFC]: allow to skip setgroups(2)~~ [RFC] allow to skip setgroups(2) Oct 29, 2019

giuseppe mentioned this issue Oct 29, 2019

linux: keep original groups without additionalGids set containers/crun#148

Closed

giuseppe mentioned this issue Nov 16, 2021

Make GID optional to allow retaining overflowgid (useful for exposing crw-rw---- devices to Rootless Containers) #1129

Open

neersighted mentioned this issue Nov 3, 2022

config: base GID must be present in the supplementary GIDs array #1168

Open

mailinglists35 mentioned this issue Jun 18, 2023

SELinux prevents rootless container from using passed device containers/podman#15930

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] allow to skip `setgroups(2)` #1020

[RFC] allow to skip `setgroups(2)` #1020

giuseppe commented Oct 29, 2019

cyphar commented Oct 29, 2019

rhatdan commented Oct 29, 2019

cyphar commented Oct 29, 2019

giuseppe commented Oct 29, 2019

tentator commented Dec 18, 2019

rhatdan commented May 3, 2021

rhatdan commented May 3, 2021

rptaylor commented Aug 31, 2021

paulraines68 commented May 18, 2022 •

edited

Loading

giuseppe commented May 18, 2022

paulraines68 commented May 18, 2022

giuseppe commented May 18, 2022

paulraines68 commented May 18, 2022 •

edited

Loading

[RFC] allow to skip setgroups(2) #1020

[RFC] allow to skip setgroups(2) #1020

Comments

giuseppe commented Oct 29, 2019

cyphar commented Oct 29, 2019

rhatdan commented Oct 29, 2019

cyphar commented Oct 29, 2019

giuseppe commented Oct 29, 2019

tentator commented Dec 18, 2019

rhatdan commented May 3, 2021

rhatdan commented May 3, 2021

rptaylor commented Aug 31, 2021

paulraines68 commented May 18, 2022 • edited Loading

giuseppe commented May 18, 2022

paulraines68 commented May 18, 2022

giuseppe commented May 18, 2022

paulraines68 commented May 18, 2022 • edited Loading

[RFC] allow to skip `setgroups(2)` #1020

[RFC] allow to skip `setgroups(2)` #1020

paulraines68 commented May 18, 2022 •

edited

Loading

paulraines68 commented May 18, 2022 •

edited

Loading