HostProcess containers for macOS #5525
-
The upcoming HostProcess container type for Windows stretches the traditional definition of what constitutes a "container" in many people's minds, conceptually boiling down to running non-sandboxed processes in a filesystem volume whose contents are sourced from an OCI container image, along with access to the host filesystem to facilitate administrative tasks. In addition to the administrative use cases discussed in KEP-1981, HostProcess containers will also provide an avenue for easily building Windows container images inside Kubernetes clusters once support for Windows containers lands in BuildKit, since access to the host network and filesystem facilitates direct access to the RPC interfaces for the Windows Host Compute Service (HCS) and Host Compute Networking (HCN) Service used by With that context in mind, I posit that there would be value in supporting HostProcess-style containers on macOS which are built atop chroot jails. Although true container support for macOS is currently precluded by the lack of necessary isolation primitives in the XNU kernel, there is a clear demand for some level of container support amongst macOS and iOS developers, as evidenced by the existence of open source projects such as Docker-OSX and proprietary products such as MacStadium Orka. These existing solutions typically rely on wrapping a macOS virtual machine in a Linux container in order to achieve full sandboxing, but I believe that a lightweight alternative based purely on filesystem isolation would be extremely attractive to developers who are simply looking to orchestrate CI/CD workloads on systems under their exclusive control and who are not concerned about strict security boundaries between containers and the host environment. This would also facilitate denser bin packing of workloads by sidestepping the need for VMs and the accompanying limitations around VM instances in the macOS EULA. I can see at least two key components that would need to be implemented in order to support chroot jail containers under macOS:
I'm by no means an expert in the architecture of either containerd or macOS itself, and so I'm seeking feedback from the community to validate this idea. Does anyone agree that this idea merits investigation and implementation effort? Would anyone be interested in assisting with said effort? Are there any details that I've overlooked, in either the premise or suggested starting points for implementation? |
Beta Was this translation helpful? Give feedback.
Replies: 16 comments 60 replies
-
Definitely an interesting idea. Poking around, macOS used to support union mounts, but maybe does not now. It's possible that firmlinks can have the same effect (I'm not sure how write-through works there, but I suspect it's not quite the same as we'd want anyway), and it's also possible that these are only for OS usage. There's also a possibility of using fused-based overlayfs or unionfs implementations via macFuse or similar. For contrast, the Windows snapshotter works with distinct "read-only" and "read-write" layer formats on disk, where the read-only layers are (ignoring registry hives etc) just the on-disk filesystem, using hardlinks to reference the parent layer, and then a read-write layer is a "differencing VHDX" using some read-only layer stack as its base, all managed via an installable filesystem minidriver (WCIFS). The maximum-effort version would be to actually implement overlayfs (or the wcifs model but using DMG files, I guess?) on OS X as a real KEXT. I imagine that will require real investment and signing and licenses from Apple etc. Although from that link
so maybe an overlayfs KEXT is a non-starter. Without one of those working, the snapshotter ends up being rather expensive in disk and time, as it would need to be extracting a copy of the source layers into the chroot, and then again in order to diff against them when committing the snapshot. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the great writeup @adamrehn.
As GameCI we can confirm this would be extremely useful to have, and I would say that this statement is spot on. We also found a question about a docker image running MacOS on StackOverflow that seems to have stagnated by the lack of solutions. I would agree that offering an initial solution, even without mentioned sandboxing features, would be a very good first step. Ideally it would end up creating some momentum. |
Beta Was this translation helpful? Give feedback.
-
An issue that just occurred to me when thinking about base layers is licensing. How much of macOS needs to actually be duplicated inside the container for the functionality we need? I haven't played around with chroots on macOS, but if we need to copy any of the base OS into the container, then such a layer would be non-redistributable. I imagine locally there wouldn't be a license problem (or chroots would suffer the same problem) but you wouldn't be able to push such layers to Docker Hub, for example. MS has this problem, which is why they provide the base layers themselves, and we have the 'foreign' layer system in OCI to support this, but unless Apple were providing the base layers, each user's locally-generated base layer is likely to have a different hash, and hence the foreign layer system doesn't help that much. Depending on what OS components are needed, perhaps they could be sourced from Darwin, but in my mind this would be like trying to create a Windows base container layer from-scratch containing ReactOS, and is likely to land somewhere between difficult and impossible. |
Beta Was this translation helpful? Give feedback.
-
Regarding the macOS base images and the issue of redistribution: in the case of Windows containers, if you look carefully at the image manifest you'll notice it points to a Microsoft CDN URL even on Docker Hub. The base layer itself is not redistributable but it is relatively easy to point to it, such that even if you pull from a foreign registry you still get it from Microsoft. At least that's how I understand it. Because containers work better with immutable layers, I suggest the following: find a way to create the base layers with community tooling in a way that makes the process repeatable with an identical output The idea would be to provide base container images as pure manifests for which the blobs are never distributed publicly. Unlike the Windows containers, we wouldn't be able to upload the blobs somewhere public. This being said, it would be relatively easy to re-generate the corresponding base image with exactly the same blob hash, and push it in a private container registry, or just in the local cache. Docker-OSX and other projects have a relatively good way to bootstrap the macOS installation using the network recovery installer. I think it could be used as a good starting point, but I don't think the installation has ever been automated (Apple being Apple, there is no such thing as an unattended installation apparently). This would probably be the biggest blocker, but I guess it might be possible to make a program to OCR the screen contents and simulate a manual installation from script. |
Beta Was this translation helpful? Give feedback.
-
Thanks to this thread on Ask Different, I've discovered the sailor project by Emile Heitor. Evidently Emile had a similar idea back in 2015, albeit in the form of a standalone tool with support for NetBSD, macOS and RHEL. This should certainly prove a useful source of information for investigation! |
Beta Was this translation helpful? Give feedback.
-
I realised just now that if we invert the question, kata-containers (containers in lightweight VMs, like Hyper-V container isolation on Windows) would probably work on macOS with some fiddling, but it seems it never got any traction: kata-containers/runtime#379 which called out layered storage and "netlink" (I assume for network setup) as cost/risk points. (That issue was closed because they are now in a new repo for "version 2", and there's no corresponding macOS support request there) Google also gave me https://fosdem.org/2021/schedule/event/containers_darwin_containerd/ which was about running Linux containers on macOS using "Linux Kernel Library (LKL), a library version of Linux kernel". That in turns points a containerd PR #4526 and also https://github.com/ukontainer/runu, which is probably not actually relevant for this, but might give a sense of what else is going on in containers ecosystem for Darwin. All that said, a comment notes: https://github.com/containerd/containerd/pull/4526/files#r604543921 about runu
so maybe this is something already on the right track. |
Beta Was this translation helpful? Give feedback.
-
I've created some social links that people can share if they would like to help boost the visibility of this initiative: |
Beta Was this translation helpful? Give feedback.
-
There is #4526 that adds some kind of darwin support to containerd, though I am not sure from its description whether it is what current discussion is about. |
Beta Was this translation helpful? Give feedback.
-
Has there been any progress on macOS containers since the last reply in this thread? I was reading on Darling the other day, and was surprised to learn how it was using an overlay filesystem very similar to Docker. There's WSL support if one can recompile the kernel to enable a specific feature, so I suspect it probably cannot be run inside a Linux container. My thinking is that we could still probably borrow parts from Darling to try and build a Linux container that can mount parts of the macOS host filesystem as read-only. Darling can run the XCode build tools from Linux, so if could manage to call native macOS build tools from a Linux container, it would already provide a lot of the desired isolation for containers, even it looks like some sort of weird hybrid. |
Beta Was this translation helpful? Give feedback.
-
Current state. There is proof-of-concept snapshotter for Darwin, accompanied with a PR to containerd. |
Beta Was this translation helpful? Give feedback.
-
So this thought is probably more geared towards running linux "containers" on macos than running macos containers. Perhaps combining runq with qemu's Hypervisor.Framework support could result in a way to containers on macos. I guess it wouldn't be much different from the current way most solutions are doing it, as vms are still involved. Another alternative could be to write a OCI runtime for Virtualizaion.Framework and I guess that could support running macos "containers". But I'm not sure that would play nice with the MacOS EULA VM restriction. |
Beta Was this translation helpful? Give feedback.
-
Is here any update to potentially see native linux container support on macos without an virtual machine handling the container runtime? |
Beta Was this translation helpful? Give feedback.
-
FYI, I just discovered Cirrus Labs Tart which aims for similar goals: https://github.com/cirruslabs/tart . |
Beta Was this translation helpful? Give feedback.
-
Based on this comment #4526 (comment). The current blocker is this PR #5935 still not merged yet. I hope this PR will be merged soon 😄 . Also @AkihiroSuda already asked @mxpv to review that PR. Hopefully @mxpv or another containerd contributor will review it |
Beta Was this translation helpful? Give feedback.
-
What is the current state of this? Is there information as to what the next step is or what is blocking? |
Beta Was this translation helpful? Give feedback.
-
Okay, let's go. Today, on 2023-09-25, I proudly announce initial 0.0.1 release of macOS containers. Supported features:
Bugreports are welcome and expected. |
Beta Was this translation helpful? Give feedback.
Okay, let's go.
Today, on 2023-09-25, I proudly announce initial 0.0.1 release of macOS containers.
Supported features:
docker build
with and without BuildKitdocker run
chroot
Bugreports are welcome and expected.