Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make container's shared memory configurable via annotation #1052

Merged
merged 1 commit into from
Jul 28, 2021

Conversation

anmaxvl
Copy link
Contributor

@anmaxvl anmaxvl commented Jun 22, 2021

This PR adds annotation "io.microsoft.container.storage.shm.size-kb" to
set container's /dev/shm tmpfs size, which overrides any existing /dev/shm
mount in the spec

Signed-off-by: Maksim An [email protected]

@anmaxvl anmaxvl requested a review from a team as a code owner June 22, 2021 03:24
@ambarve
Copy link
Contributor

ambarve commented Jun 22, 2021

Just out of curiosity: Why do we do this in gcs than doing it during container document creation?

@dcantah
Copy link
Contributor

dcantah commented Jun 22, 2021

@ambarve I think I'd said that we handle a similar annotation that changes the oci spec (io.microsoft.virtualmachine.lcow.privileged) in the guest so it might make sense to do the same for this.

Maybe we should move both out to the initial document creation if it makes sense

@katiewasnothere
Copy link
Contributor

Why not filter out the initial mount in CRI the way /dev and /sys/fs/cgroup mounts are filtered out today?

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 23, 2021

Why not filter out the initial mount in CRI the way /dev and /sys/fs/cgroup mounts are filtered out today?

@ambarve I think I'd said that we handle a similar annotation that changes the oci spec (io.microsoft.virtualmachine.lcow.privileged) in the guest so it might make sense to do the same for this.

Maybe we should move both out to the initial document creation if it makes sense

all the way to CRI? or?

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 23, 2021

Why not filter out the initial mount in CRI the way /dev and /sys/fs/cgroup mounts are filtered out today?

I started off by doing something like this: anmaxvl/cri@cb87112, which needs to be changed since we'll eventually have 2 configs: 1 for sandbox /dev/shm (which needs to be implemented first) and the second one is for container (current PR).

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 23, 2021

Just out of curiosity: Why do we do this in gcs than doing it during container document creation?

same question as to @dcantah.

@dcantah
Copy link
Contributor

dcantah commented Jun 23, 2021

all the way to CRI? or?

No just in the LCOW container document setup in the shim I was thinking. I feel like checking/having logic for our annotations in our cri fork is gonna be a pain if we ever move off of it (but I'm guilty of doing the same).

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 24, 2021

all the way to CRI? or?

No just in the LCOW container document setup in the shim I was thinking. I feel like checking/having logic for our annotations in our cri fork is gonna be a pain if we ever move off of it (but I'm guilty of doing the same).

looking a bit more into it, we could update the /dev/shm mount as part of the document creation, however we cannot move some of the other customizations as easily (e.g. privileged container part), since they check the devices on the UVM in order to share them within the container. that being said, do we really want to move the /dev/shm logic out?

@dcantah
Copy link
Contributor

dcantah commented Jun 24, 2021

@anmaxvl I don't mind it here, don't know about others.

@anmaxvl
Copy link
Contributor Author

anmaxvl commented Jun 24, 2021

@katiewasnothere, @ambarve ?

@dcantah
Copy link
Contributor

dcantah commented Jun 29, 2021

I don't really have an issue with it being done in the gcs but maybe we should have a function that sets up all of these annotation specific overrides instead of them being inline, as there's now three.

e.g.

func <validateAnnotationsOrSomething>(spec *oci.Spec) error {
	if val, ok := spec.Annotations["io.microsoft.container.storage.shm.size-kb"]; ok {
		sz, err := strconv.ParseInt(val, 10, 64)
		if err != nil {
			return errors.Wrap(err, "/dev/shm size must be a valid integer")
		}
		if sz <= 0 {
			return errors.Errorf("/dev/shm size must be a positive integer, got: %d", sz)
		}

		size := fmt.Sprintf("size=%dk", sz)
		mt := oci.Mount{
			Destination: "/dev/shm",
			Type:        "tmpfs",
			Source:      "shm",
			Options:     []string{"nosuid", "noexec", "nodev", "mode=1777", size},
		}
		spec.Mounts = removeMount("/dev/shm", spec.Mounts)
		spec.Mounts = append(spec.Mounts, mt)
	}

	// Check if we need to do any capability/device mappings
	if spec.Annotations["io.microsoft.virtualmachine.lcow.privileged"] == "true" {
		log.G(ctx).Debug("'io.microsoft.virtualmachine.lcow.privileged' set for privileged container")
		// Add all host devices
		hostDevices, err := devices.HostDevices()
		if err != nil {
			return err
		}
		for _, hostDevice := range hostDevices {
			addLinuxDeviceToSpec(ctx, hostDevice, spec, false)
		}
		// Set the cgroup access
		spec.Linux.Resources.Devices = []oci.LinuxDeviceCgroup{
			{
				Allow:  true,
				Access: "rwm",
			},
		}
	} else {
		tempLinuxDevices := spec.Linux.Devices
		spec.Linux.Devices = []oci.LinuxDevice{}
		for _, ld := range tempLinuxDevices {
			hostDevice, err := devices.DeviceFromPath(ld.Path, "rwm")
			if err != nil {
				return err
			}
			addLinuxDeviceToSpec(ctx, hostDevice, spec, true)
		}
	}

	if userstr, ok := spec.Annotations["io.microsoft.lcow.userstr"]; ok {
		if err := setUserStr(spec, userstr); err != nil {
			return err
		}
	}
        return nil
}

Copy link
Contributor

@dcantah dcantah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Would like to see what others think about altering the spec in the guest from annotations though.

Copy link
Contributor

@ambarve ambarve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM!
I am okay with keeping the annotation processing part in opengcs since moving the other two annotations on hcsshim side won't work.

add annotation "io.microsoft.container.storage.shm.size-kb" to
set container's /dev/shm tmpfs size.

this overrides any existing /dev/shm mounts in the spec

additionally move the annotations parsing logic into a separate
function

Signed-off-by: Maksim An <[email protected]>
Copy link
Contributor

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants