Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] allow to skip setgroups(2) #1020

Open
giuseppe opened this issue Oct 29, 2019 · 13 comments
Open

[RFC] allow to skip setgroups(2) #1020

giuseppe opened this issue Oct 29, 2019 · 13 comments

Comments

@giuseppe
Copy link
Member

There are cases where it would be necessary to skip the setgroups(2) syscall so that the original additional groups can be maintained.

It can be used, for example, by rootless containers to keep access to a storage directory that is accessible only by a secondary group.

runc already skips the setgroups in some cases: either if the user had euid != 0 or if /proc/self/setgroups is set to deny. I'd like to add a third condition where the setgroups is skipped also if explicitly requested.

Do we need a new field under process/user, e.g. keepOriginalGroups? Would be enough to reuse additionalGids to have some special value (e.g. -1 to keep current groups)?

@giuseppe giuseppe changed the title [RFC]: allow to skip setgroups(2) [RFC] allow to skip setgroups(2) Oct 29, 2019
@cyphar
Copy link
Member

cyphar commented Oct 29, 2019

If we do add an option, it needs to have a really scary name (disableSetgroupSecurity or something). Not dropping supplementary groups weakens the userns security boundary, and really is something that very few people should actually want to do (not least of all because it will confuse all sorts of programs to be touching unmapped files).

In my view, the best solution to the problem of such volumes is to do exactly what LXD does -- "punch out" the GID that the storage volume is owned by (by adding a single 1:1 mapping for that GID). The most ideal solution would be the next-gen "shiftfs" work that was discussed recently, but obviously we'll have to wait for that to actually land.

@rhatdan
Copy link
Contributor

rhatdan commented Oct 29, 2019

I am skeptical, and think it could be a long wait, especially to get it upstream.

@cyphar
Copy link
Member

cyphar commented Oct 29, 2019

(Also I would seriously suggest that this is functionality that should be exposed through a runtime-specific annotation and not a first-class field in config.json -- the runtime-spec already has lots of really odd features we probably shouldn't have added, and this one just rubs me the wrong way.)

@giuseppe
Copy link
Member Author

In my view, the best solution to the problem of such volumes is to do exactly what LXD does -- "punch out" the GID that the storage volume is owned by (by adding a single 1:1 mapping for that GID). The most ideal solution would be the next-gen "shiftfs" work that was discussed recently, but obviously we'll have to wait for that to actually land.

how would it work with rootless containers or in general with IDs that are not mapped in the current namespace? I guess rootless containers won't still be able to map arbitrary IDs from the host.

@tentator
Copy link

Hello @cyphar ,
I also agree with Giuseppe since I have a customer that has exactly the same problem: he is wanting to use rootless podman but is currently limited by the missing subgroups to access mounted directories..
Also not sure why you think it would be a a decreased security boundary: I mean all groups the user has configured are exptected to be inherited with such an option; why limit it to the main group only?
About what kind of option should be used by runc, I'm of course open to any naming etc since that is less relevant.
What is relevant for my customer is that he will still decide to go with rootless podman depending on this feature to work or not, since otherwise he does see only the chance to run it as root which from a scurity point of view I guess it's clearly worse..

Thanks for letting me know,
Cisco.

@rhatdan
Copy link
Contributor

rhatdan commented May 3, 2021

This feature has become very popular in Rootless Podman, We are seeing lots of users that need access to files and devices, via supplemental groups. We have recently made this a first class feature of podman.

podman run --group-add keep-id ...

Currently this is only supported in crun, and we would love to get it to work in runc. I would hope in the future we had better support, where we could keep access to the groups as well as add groups within the user namespace, but for now this fixes a key issue rootless users are hitting. I think we see this a lot more in enterprise customers then we even see in wild.

@rhatdan
Copy link
Contributor

rhatdan commented May 3, 2021

@rptaylor
Copy link

As an unprivileged user on a host, I have read/write access to various files, some via ownership and some via group membership.
I can mount any files I want into my container as volumes, but I can only read/write to the ones I own. The ones I access via my group memberships can only be read/written via podman with crun thanks to the --group-add option.
I don't really understand why this is only possible via the special crun flag; if I can bring files into my rootless container by mounting them as volumes shouldn't I be able to access them in the same way as outside the container?

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

@paulraines68
Copy link

paulraines68 commented May 18, 2022

@rptaylor

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

I want to add as the sysadmin of a HPC batch cluster at a major biomed academic center, this secondary group issue is the primary reason we are using Singularity rather than rootless podman. We use secondary groups extensively for various users and group to work together on sensitive data sets. Containers are used to run analysis programs like Tensorflow from NVIDIA NGC or distributed docker images of apps built on (for example) Ubuntu 20 that otherwise cannot run on the RHEL7 nodes.

@giuseppe
Copy link
Member Author

@rptaylor

I know there are technical reasons, but I think the security model should be considered differently in different contexts. Sometimes a container is used to isolate and contain an external application (i.e. something pulled from a repository) in a controlled environment and you don't want it to see or touch anything outside. But in other cases (Singularity, rootless podman), you as the user are already "outside" and you're choosing to contain yourself , so you should have full control of how that happens and how to invoke the containment tool; the same security considerations do not apply since you can already do whatever you want on the host.

I want to add as the sysadmin of a HPC batch cluster at a major biomed academic center, this secondary group issue is the primary reason we are using Singularity rather than rootless podman. We use secondary groups extensively for various users and group to work together on sensitive data sets. Containers are used to run analysis programs like Tensorflow from NVIDIA NGC or distributed docker images of apps built on (for example) Ubuntu 20 that otherwise cannot run on the RHEL7 nodes.

If it can be useful for you: Podman when used together with crun supports the --group-add keep-groups extension to skip setgroups in the container

@paulraines68
Copy link

If it can be useful for you: Podman when used together with crun supports the --group-add keep-groups extension to skip setgroups in the container

crun is not available on RHEL7 that I can find

There is an oddness on CentOS8 Stream box

$ rpm -q podman
podman-4.0.2-1.module_el8.7.0+1106+45480ee0.x86_64
$ rpm -q runc
runc-1.0.3-3.module_el8.7.0+1106+45480ee0.x86_64
$ ls -ald /tmp/gptest
drwxrws---. 2 root sysadm 4096 May 18 15:44 /tmp/gptest
$ groups
raines httpd fsdev sysadm coregp webdev hcpdata
$ podman run -it --runtime=/usr/bin/crun --userns=keep-id --group-add=keep-groups -v /tmp/gptest:/gptest b1b6387124d9 /bin/bash
raines@806c89baacd3:/$ groups
raines nogroup
raines@806c89baacd3:/$ id
uid=5829(raines) gid=5829(raines) groups=5829(raines),65534(nogroup)
raines@806c89baacd3:/$ cd /tmp/gptest
bash: cd: /tmp/gptest: No such file or directory
raines@806c89baacd3:/$ cd /gptest
raines@806c89baacd3:/gptest$ uname -a > foobar.txt
bash: foobar.txt: Permission denied
raines@806c89baacd3:/gptest$ ls -ald .
drwxrws---. 2 nobody nogroup 4096 May 18 19:44 .

The 'nogroup' thing is wierd (singularity reports the groups normally) and I don't understand why I can cd to /gptest (read access) but not write.

@giuseppe
Copy link
Member Author

It is not available on RHEL7. I think you need to specify the --runtime option to o podman before the run like podman --runtime=... run ...

@paulraines68
Copy link

paulraines68 commented May 18, 2022

Unfortunately still not quite right:

$ echo here > /tmp/gptest/iamhere.txt
$ mkdir /tmp/gptest/subdir
$ ls -ald /tmp/gptest/subdir
drwxrwsr-x. 2 raines sysadm 4096 May 18 16:46 /tmp/gptest/subdir
$ podman --runtime=/usr/bin/crun run -it --rm --annotation=run.oci.keep_original_groups=1 --userns=keep-id --group-add=keep-groups -v /tmp/gptest:/gptest b1b6387124d9 /bin/bash
raines@6bcd8fc2304e:/$ groups
raines nogroup
raines@6bcd8fc2304e:/$ ls -ld /gptest
drwxrws---. 2 nobody nogroup 4096 May 18 19:44 /gptest
raines@6bcd8fc2304e:/$ cd /gptest
raines@6bcd8fc2304e:/gptest$ ls
ls: cannot open directory '.': Permission denied
raines@6bcd8fc2304e:/gptest$ echo foobar > foobar.txt
bash: foobar.txt: Permission denied
raines@6bcd8fc2304e:/gptest$ cat iamhere.txt
here
raines@6bcd8fc2304e:/gptest$ echo too >> iamhere.txt
raines@6bcd8fc2304e:/gptest$ cat iamhere.txt
here
too
raines@6bcd8fc2304e:/gptest$ cd subdir
raines@6bcd8fc2304e:/gptest/subdir$ ls
ls: cannot open directory '.': Permission denied
raines@6bcd8fc2304e:/gptest/subdir$ ls -ald /gptest/subdir
drwxrwsr-x. 2 raines nogroup 4096 May 18 20:46 /gptest/subdir

So actually it is 'x' bit that works for the cd, but 'r' and 'w' do not. But one can read and write to existing files in the dir. Really wierd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants