-
-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double disk de-duplication on --system pull #2342
Conversation
Can one of the admins verify this patch?
|
@alexlarsson @matthiasclasen |
c239e5c
to
ad3390a
Compare
Ok. New approach. First of all, we don't need bindfs, we can use rofiles-fuse as well, which is shipped with ostree. Secondly, we create a user(+default group) for flatpak (normally in the flatpak package install). Then, we run rofiles-fuse with -o allow_other to allow other users to access the mountpoint (relies on this being allowed by the sysadmin in /etc/fuse.conf). Then we protect the various directories from access by the wrong users by using regular filesystem permissions. Here is an example (starting in a root shell):
Now, in a uid 1000 shell you can do:
Back again as in root shell:
Now we can merge the root-owned, repo from /var/lib/flatpak/repo/tmp/pull_dirs_hidden/pull1/repo into /var/lib/flatpak/repo. All access to it via the rofiles-fuse mount should be impossible, and only the flatpak user can access it via the other path, so that should be trustworthy. Does this seem safe ok? |
☔ The latest upstream changes (presumably e14af35) made this pull request unmergeable. Please resolve the merge conflicts. |
I have only sort-of been following this work, but just a quick comment:
One note; so far Longer term...it feels like what we really want is to potentially support privilege-separated fetches in libostree upstream (think having an API over a socketpair like "does the repo have object X" and "write object X" - I think that'd be close to all we'd need). |
@cgwalters Yeah, but we're running rofiles-fuse as a non-privileged user, so I don't think its a huge risk. In fact, the reason i prefer it over bindfs is that its smaller and simpler, so less risky. The reason we're using it is really not anything that rofiles-fuse implements, but the fact that it gates access to the repo files via a resource we control (the fuse process). Killing that process is a robust way to implement revoke() on an entire directory tree. |
I think this API would run aground against @alexlarsson's earlier point:
Does the current verify-and-copy scheme have the same issue, though? I guess not: although there's no requirement that the temporary repo path passed to Deploy() is on the same filesystem as the system installation, you do have to pass a polkit check before anything gets copied. The same is true for this branch: you have to pass a polkit check to call GetChildRepoForPull. |
@alexlarsson Sounds safe to me. Re. "Should fix permissions too", I guess you mean "remove any permission bits outside 0775" even if this invalidates the content hash, on the theory that if the files had such permissions, Flatpak sets |
What i mean was that we canonicalize file permissions in according to the current rules at commit time. (I.e. matching what can be represented by bare-user-only). This means for instance that a ostree object with world-writable permissions would be changed to not be world-writable, this will change the checksum and fail the checksum when commiting, so it will report an error. You mean we should fail this with a different error than EWRONGCHECKSUM, which i guess makes sense. However, we need to make sure we don't abort at the first such case, because that could leave other word-writable directories around, which we need to avoid. |
So, @uajain asked me on irc about the "track when its mounted" part, and i forgot about that. So here goes: First of all, we want to use core kernel features, not call out to some ipc service that polls some file that some other thing writes or whatever, that will never be robust in face of things like namespaces with different mount tables and whatnot. Secondly, the correct way to know something is mounted is when the mount() syscall returns. This happens in the fuse filesystem implementation, and this is signaled to the parent by the launched fuse-fs process exiting. So, all we have to do is launch it (without the foreground option) and then wait for it to exit. This is proper anyway, because that way we can track any errors happening up to (and including) the mount syscall and handle it correctly. Of course, the problem with this is that we were using the foreground flag for a reason. Without it its hard to ensure the fuse filesystem properly goes away. Unfortunately there is no way for rofiles-fuse currently to return the child pid for the fuse process. There are a few options here:
Also, one has to track the exiting of the fuse process. If we know the pid we can use PR_SET_CHILD_SUBREAPER to inherit the process when the intermediate child dies and then get sigchild. However, if we don't know the pid its hard to know what sigchild to wait for. In that case we'd have to use some other way. Maybe pass an pipe fd to the child and poll for the other side closing it (when it dies) for instance. |
I think the easiest way is to make rofiles-fuse report the pid somehow (e.g. to a pipe fd passed in as a commandline arg). We can use /proc/$pid/task/$pid/children to get the child pid, which is a bit hacky, but a lot easier than changing libfuse. |
Actually, i take that back. It should be fairly easy to make rofiles-fuse get the real pid. All you need to do is instead of using fuse_main() make your own version of fuse_main() that calls fuse_setup(), fuse_loop(), and fuse_teardown_common(). At this point you know the mount has succeeded by the time fuse_setup() returns. You can then hack this by passing -f as and extra argument, and then reimplementing fuse_daemonize() ourselves before calling fuse_setup(). |
Actually, I think there is an even simpler way. We launch rofiles-fuse with |
In this case we would create a pipe before spawning rofiles-fuse, and then not continuing until its written too. Actually, we should probably signal the parent by closing the fd instead of writing to it, because then that also happens if rofiles-fuse returns early due to some error. |
So, @wjt brought up reservations about relying on the user_allow_other global setting on irc. And i agree. The ideal approach would be to run the fuse filesystem as the target user. That doesn't work because the user can't write to the filesystem though (that is the point after all). So, as a different approach I propose a new fuse filesystem "revokefs", that basically takes the rofiles-fuse code, removes the existing can't-write-to-hardlink stuff, and then makes all the operations that write stuff and implement them out of process via messages over a unix socket. The idea here is that you can spawn the fuse filesystem as the target user, and the writing part as the flatpak user, and then you can pass the socket fds between these at setup. You then have the repo readable to the user, but only writable to the flatpak user, and once you shut down the write socket (by e.g. killing the write process, or just calling shutdown() on the socket fd that you kept around) you've essentially revoked all write access to the user. The messages between the processes can be very simple. I propose using a SOCK_SEQPACKET for easy segmentation and a simple header for the arguments + the raw data. |
The revoke via privilege separation fuse idea seemed so simple and useful that I had to try it. |
Interesting with revokefs-fuse is that we can just have the system helper spawn the backend (as some other user) and then pass the socket to the flatpak client which can then spawn the entire fuse filesystem wherever it wants as the user. |
Hmm, we need to add some permission filtering to revokefs too, otherwise the user could create writable files in the backing directory, or files setuid to the flatpak user. |
Hmm, is it really dangerous? Given the files will be owned by flatpak user and before deploying, flatpak_canonicalize_permissions will act upon all these files to clear the setuid bit. |
The entire point of revokefs is lost if the user can change the base directory to be writable by the user... |
(oh, and setuid flatpak means you can turn into the flatpak user and get write rights, which similarly is bad) |
hmm, I see. Maybe my definition of "Trust"(-ed user) as not as rigid as yours. Anyways today, I am learning more about FUSE in general and revokefs. I need to fillup my knowledge holes in order to make progress on this PR. :) hah! |
Well, the point is it you have direct write rights (be it by just changing a directory permissions and writing there, or making an binary that does writes a file and mark it setuid flatpak and run it), then you can have an outstanding write file-descriptor to some object file by the time you tell the system-helper to merge the repo. Since we're not longer making a copy of the file, this means the file-descriptor you have is still going to be valid and useful after the file has been chowned to root and put into place in the system repo, and you can now modify the system repo however you want. |
The revokefs name comes from the revoke syscall, btw: https://lwn.net/Articles/192632/ If we had this we wouldn't need all these workarounds. Unfortunately linux doesn't have revoke(). |
Also, if we didn't care about the revoke issue we would just let the user write to files as himself and then move+chown them before the import. |
I have attempted to summarize the problem, the design of the current proposed solution, and the past solutions we've discussed/explored and decided against, on https://github.com/flatpak/flatpak/wiki/Noncopying-system-app-installation . |
Closing in favor of #2657 (merged) |
This PR tries to address the issue : ostreedev/ostree#1723
(ideally we wanted to move this issue inside Flatpak's umbrella but it's seems that it's not possible). So please refer to the discussion there. Most of moderate to high-level details inside commit messages too.
Summary of changes:
$SYSTEM_REPO/tmp
. For convenience put this child repo in a cache dir (that has the conventionflatpak-cache-$ref-XXXXXX
)FlatpakSystemHelperOngoingPull
's struct memberroot_dir
points to thisflatpak-cache-$ref-XXXXXX
(because it's the root owned directory)FlatpakSystemHelperOngoingPull
's struct memberuser_dir
points to this another directory that gets createdflatpak-cache-$ref-XXXXXX-bindfs
. We willbindfs
mount this directory and give it back to the client to do the pull in the child repo.bindfs
lets client (non-sudo) write to this child repo. While the actual file creation is being done in theroot_dir
folder. As long as the bindfs mount is up, the client is able to write(or pull) to the child repo.bindfs
mount. In case of failure, we force unmount things or kill thebindfs
subprocess. The idea to use a FUSE filesystem is basically that it is in user space and can be spawned into a separate child process(which can be killed).CancelPull
method, to cleanup.bindfs
is not found on the system, we fallback on the current--system
pull behaviour (that would use double-disk space).I have been testing this since couple of weeks now. It has worked well in my testing locally. Though I am having troubles on how to write rigours tests for this feature (any help is welcome here).