Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/snap-confine: release: Basic Ubuntu Touch & Halium enablement #13661

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

fredldotme
Copy link
Contributor

@fredldotme fredldotme commented Mar 2, 2024

These changes allow for preliminary support of Halium/libhybris drivers and enabling some more sophisticated Snaps to work on Ubuntu Touch (like CUPS).

For more information about Halium: https://halium.org

Largely based on the NVIDIA support and provides a working environment for confined Qt5, Qt6, SDL, GTK and Flutter Snaps (as long as a GLES renderer is available) like QML Creator (sudo snap install qmlcreator) and running it with the wayland-egl platform.

Copy link

github-actions bot commented Mar 2, 2024

Everyone contributing to this PR have now signed the CLA. Thanks!

@Meulengracht Meulengracht added the Needs security review Can only be merged once security gave a :+1: label Mar 4, 2024
@bboozzoo bboozzoo requested a review from zyga March 4, 2024 09:17
// OnTouch states whether the process is running inside a Touch image
// with a classic (but mostly read-only) filesystem layout.
// It is a narrowed down variant of OnClassic
var OnTouch bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd split the part specific to Touch into a separate PR. There may be questions about how we want to approach this specific variant that are unrelated to supporting libhybris

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbf I've waited pretty long on the previous PR that has been sitting for a while, which used to be the libhybris-only PR. But if more people want to have it split up, then sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split it. The s-c bits should be fairly easy to land, but folks may be asking questions about this one and there's no point in holding back the rest of the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@bboozzoo bboozzoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Please have a look at the comments.

static void sc_hybris_mount_android_rootfs(const char *rootfs_dir)
{
// Bind mount a tmpfs on $rootfs_dir/$tgt_dir (i.e. /var/lib/snapd/lib/gl)
char rfsbuf[512] = { 0 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can reuse the buffer since snprintf always adds terminating \0 and simply interleave snprintf/fs operation sequence

Also, please use PATH_MAX where appropriate:

Suggested change
char rfsbuf[512] = { 0 };
char rfsbuf[PATH_MAX] = { 0 };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

//
// The glob list passed to us is produced with paths relative to source dir,
// to simplify the various tie-in points with this function.
static void sc_hybris_populate_libgl_with_hostfs_symlinks(const char *libgl_dir,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a copy of sc_populate_libgl_with_hostfs_symlinks, could we make the former non-static and share the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care of.

}
}

static void sc_hybris_mkdir_and_mount_and_glob_files(const char *rootfs_dir,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, can we share sc_mkdir_and_mount_and_glob_files from nvidia support code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, noted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 479 to 480
setenv("HYBRIS_LINKER_DIR", "/var/lib/snapd/lib/gl/libhybris/linker", 1);
setenv("HYBRIS_EGLPLATFORM_DIR", "/var/lib/snapd/lib/gl/libhybris", 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels misplaced, neither snap-confine nor any part of the snap execution chain delivered by snapd sets any of the environment variables related to mesa/vulkan/egl. Most likely this should be part of to https://github.com/snapcore/snapcraft-desktop-integration

Copy link
Contributor Author

@fredldotme fredldotme Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO having snap-confine set up the environment by itself allows for:

  • Expected environment for any Snap
  • Less ways of the Snap screwing up something in the setup (because it misses -desktop-integration or does things itself)
  • Have existing Snaps working as-is.

I cannot expect upstream app developers to care or have the knowledge to setup libhybris appropriately, which is why I put it there.

EDIT: Also I would argue that the libhybris enablement is for more than just GL drivers, it actually allows any appropriately wrapped bionic-library to function inside of a Snap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think I'd rather place it somewhere around here:

snapd/cmd/snap-exec/main.go

Lines 211 to 213 in 68fe7eb

if err1 == nil && err2 == nil && stVar.Dev != stVarCups.Dev {
env["CUPS_SERVER"] = "/var/cups/cups.sock"
}
there's a precedent for it already for Cups

cc @zyga ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken care of.

@fredldotme
Copy link
Contributor Author

Since security is supposed to be involved in the discussion, here is a little hint about something that might come up:

For Halium/Android 9-only devices we had to ship a TLS-padding hack in the form of a shared library that needs to be LD_PRELOADed into the process, otherwise bionic libraries and glibc libraries would have a mismatch in TLS sizes. The do-nothing library resides here: https://gitlab.com/ubports/development/core/hybris-support/tls-padding/-/blob/main/tls-padding.cpp?ref_type=heads

Ubuntu Touch sets the LD_PRELOAD environment variable to include this hack into every user-app process, so ideally we can keep it to keep having 9 devices chugging along.

It explains the necessity to include the tls-padding .so into the AppArmor profile so that it can be mapped into the process image.

@bboozzoo bboozzoo changed the title Basic Ubuntu Touch & Halium enablement cmd/snap-confine: release: Basic Ubuntu Touch & Halium enablement Mar 6, 2024
@fredldotme
Copy link
Contributor Author

fredldotme commented Mar 6, 2024

Thanks for the additional comments, will take care of them.

I just noticed I wasn't specific enough on the reasoning regarding commit d779cae where I stated "on a separate filesystem". This means udev isn't able to manage it using devtmpfs and udevadm being unable to query properties of those binderfs endpoints.

EDIT: CLA signed with the email address in my commits.

@zyga
Copy link
Collaborator

zyga commented Mar 18, 2024

I will happily review this after 2.62 is released. Apologies for the delay, we're really busy trying to get the release in good shape.

/{,var/}run/shm/hybris_shm_data rw, # FIXME: LP: #1226569 (make app-specific)
# This is a LD_PRELOADed TLS padding library allowing use of both
# Android 9 and glibc TLS slots at the same time. Only used on Halium 9.
/usr/lib/@{multiarch}/libtls-padding.so mr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the library resolved at runtime, the directory will be from the perspective of a pivot_root base snap which (I suspect?) does not carry it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular library resides on the host, which then gets picked up by snap-confine AFAICT.

In general thread-locals in a shared library are taken care of after initialization by ld, before the actual executable is run. Think calling constructors of a static variable, it's similar in style to how ld treats thread-locals.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record, host files do not appear at that location. What you see here comes from the base snap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprised then that snap-confine correctly stops complaining about not being able to map that library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're setting LD_PRELOAD in your environment, than the denial from s-c should be raised regardless of this interface, as s-c is running under a different profile. Now, setting LD_PRELOAD in the environment will not have an effect of the snap as IIRC LD_PRELOAD isn't carried over to the snap's environment. Now, if OTOH is being set within the snap, then yes, if you allow it explicitly in the file like here, the denial will go away even if the file doesn't exist. Can you paste the denial you saw?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it some time, I've come to the conclusion that we might still want to pass through the preload hack to the confined environment in case it's running on Touch specifically. Would that be feasible from your perspective?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ld.so will strip out LD_PRELOAD before it switches execution to snap-confine, i.e. snap-confine will not even see that environment variable, so there's no chance of passing it to the snap. Perhaps you can load it from /etc/environment when executing in the snap, but it would not be automatic.

Copy link
Contributor Author

@fredldotme fredldotme Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking of measuring whether it's running on Touch via other means (checking for existance of a Halium-only file) and recreating the environment variable. It might sound hacky at first but there really seems to be no easy other way.

EDIT: I suppose that needs to happen in the desktop-helpers then? In that case I would reduce this MR by just allowing mapping the file from the hostfs and do the rest in the launchers. Or how about a SNAP_LD_PRELOAD which allows snapd to dictate the environment variable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you've ran some apps with this change built locally. Were they able to start and function properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I couldn't as my assumption about Halium 9 vs 10 was wrong. I thought one of my test devices was susceptible to the issue, whereas it was just Halium 10 MediaTek devices specifically (which seem to have TLS slots hardcoded in their bionic-based GL libs).

/{,android/}apex/com.android.i18n/lib{,64}/**.so m,
/{,dev/}socket/property_service rw, # attach_disconnected path
/{,dev/}socket/logdw rw, # attach_disconnected path
/{,dev/}__properties__/** rw, # attach_disconnected path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, where is properties coming from? Is this some Android-specific interface? What's inside?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __properties__ part contains SELinux info of process labels allowed to access Android properties through getprop and setprop. It's more of a void on UT though since functionality tied to SELinux is stubbed out to not conflict with AppArmor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need rw access or just read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially only needs read-access, since that is set up in the Halium container. But I'll have to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@fredldotme
Copy link
Contributor Author

I will happily review this after 2.62 is released. Apologies for the delay, we're really busy trying to get the release in good shape.

Ping! I've noticed 2.62 landing, and I don't want to be left behind. :)

@fredldotme
Copy link
Contributor Author

For the record and completeness sake, here is the list of devices tested with these changes:

  • Halium 7.1: Sony Xperia X (suzu)
  • Halium 9.0: Google Pixel 3a (sargo)
  • Halium 10: JingPad (jingpad_a1)
  • Halium 11: Fairphone 4 (fp4) & Volla Phone 22 (mimameid)

@fredldotme
Copy link
Contributor Author

Sooo... anything new?

@bboozzoo
Copy link
Collaborator

@alexmurray can you have a look?

Copy link
Collaborator

@alexmurray alexmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the LD_PRELOAD hack which from the existing comments seems unnecessary / unworkable via snap-confine and should be removed.

sc_identity old = sc_set_effective_identity(sc_root_group_identity());
int res = mkdir(android_rootfs_dir, 0755);
if (res != 0 && errno != EEXIST) {
die("cannot create tmpfs target %s", android_rootfs_dir);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this does't appear to be a tmpfs so this error message is misleading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in d5a85d6

@@ -42,6 +42,8 @@
/{,usr/}lib{,32,64,x32}/{,@{multiarch}/}libudev.so* mr,
/{,usr/}lib{,32,64,x32}/{,@{multiarch}/}libseccomp.so* mr,
/{,usr/}lib{,32,64,x32}/{,@{multiarch}/}libcap.so* mr,
# TLS padding hack for Android 9
/{,usr/}lib{,32,64,x32}/{,@{multiarch}/}libtls-padding.so mr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be removed given the prior discussions that it appears unneeded.

Copy link
Contributor Author

@fredldotme fredldotme Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not needed for most devices, but rather for a specific series of devices from a specific Android generation: Android 10 Mediatek, which I don't own, so I cannot figure out a solution by myself.

I am fine with the other changes landing but realistically we will need a solution for Mediatek devices with Android 10 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Handled in d5a85d6

static void sc_udev_allow_hybris(sc_device_cgroup *cgroup)
{
struct stat sbuf;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: would this be easier to read if it was written as something like the following (note untested):

const char *paths[] = {"/dev/binderfs/binder", "/dev/binderfs/hwbinder", "/dev/binder", "/dev/hwbinder"};
for (int i = 0; i < sizeof(paths)/sizeof(paths[0]); i++) {
    struct stat sbuf;
    if (stat(paths[i], &sbuf) == 0) {
        sc_device_cgroup_allow(cgroup, S_IFCHR, major(sbuf.st_rdev),
				       minor(sbuf.st_rdev));
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, based it on your template. Handled in d5a85d6

/{,var/}run/shm/hybris_shm_data rw, # FIXME: LP: #1226569 (make app-specific)
# This is a LD_PRELOADed TLS padding library allowing use of both
# Android 9 and glibc TLS slots at the same time. Only used on Halium 9.
/var/lib/snapd/hostfs/usr/lib/@{multiarch}/libtls-padding.so mr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this should be removed since it appears unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check and address my previous comment above, we still need a solution for Mediatek Android 10 devices and I doubt waiting for much longer is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, short-term I am fine without supporting MTK-10 devices for now, but I would really like to have this handled later.

Handled in d5a85d6

Copy link
Collaborator

@zyga zyga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look from top to bottom. Some comments.

I would like to land the move to nvidia/mount logic separately to make the diff shorter. I can propose that separately.

}
(void)sc_set_effective_identity(old);

(void)mount(SC_HYBRIS_ROOTFS, android_rootfs_dir, NULL, MS_BIND | MS_REC | MS_RDONLY, NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be checked and should s-c die if it does not work? If it is legitimately allowed to fail, as indicated by the (void) up front, can you add a comment just above
to explain why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can take a deeper look later after $DAYJOB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in d5a85d6

sc_device_cgroup_allow(cgroup, S_IFCHR, major(sbuf.st_rdev),
minor(sbuf.st_rdev));
}
if (stat("/dev/binder", &sbuf) == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this runs everywhere, what happens if someone installs, say, anbox and gets /dev/binder. Now all snaps will allow access to the binder. I don't have an immediate idea on how to make this better without some host-level hint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The host level hint can again be a check for existance of /system/build.prop, which under Halium distributions always has to exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handled in d5a85d6

@MiguelPires MiguelPires removed their request for review May 28, 2024 08:54
snap-confine: Additional AppArmor rules for proper libhybris functionality in snap-confine

interfaces: Add libhybris AppArmor paths & udev rules to opengl

Tested working on the Pixel 3a (Android 9, Qualcomm).

snap-confine: Set up hybris environment variables if appropriate

Hybris needs the linker and EGL platform paths properly set up
to allow running graphical applications using Android drivers.
Required to grant access to the respective devices in /dev.
…t for strict

libhybris can accept environment variables allowing relocating the library loader
and having full functionality by providing the userspace drivers in /var/lib/snapd/lib/gl.

libhybris and all Halium/Android driver bits need to be set up for strict only,
classic can continue consuming drivers from the standard distribution paths.
- Use PATH_MAX
- Commonalize shared functions between NVIDIA & libhybris
- Unify and use SC_HYBRIS_PROPERTY_FILE appropriately
…pu model

Newer Qualcomm devices (like the Fairphone 4) seem to have their userspace GL
libraries query the kernel-mode driver for the respective GPU model.
binder and hwbinder are devices residing in their own filesystem.
In order to allow access to them they need to be unconditionally
allowed in the cgroup for libhybris drivers to work accordingly.

Note that binderfs is checked before binder in order to distinguish
between an Android 11+ and lower kernels since newer ones place
a compatibility symlink at the old and familiar path.
Do the environment variable setup for libhybris inside of snap-exec.
As required by some devices, this allows loading i18n APEX into the process
in order to allow certain bionic libraries to work properly.

Additionally explain the use and necessity for libtls-padding inclusion.
…ead-only

The property area is set up by Halium, confined snaps might merely read it.
The confined snap can only load libs from directories prefixed with
/var/lib/snapd/hostfs, so ensure libtls-padding that resides there
is mappable into the process.
- Cleanup and remove TLS-padding permit from AppArmor rules
- Only allow binder access on Halium/libhybris systems
- Die in case the Halium environment cannot be mounted into the target
- Restructure binder device mount operation to iterate through a list of
  allowed device paths
Copy link

github-actions bot commented Jun 4, 2024

Everyone contributing to this PR have now signed the CLA. Thanks!

@fredldotme
Copy link
Contributor Author

fredldotme commented Jun 4, 2024

I've addressed all mentioned requested changes, but I came up with another one which I'll happily add in case I get a "makes sense" from you:

How about only adding those libhybris-related AppArmor changes to the template in case it detects /system/build.prop on the host? Should also reduce potential issues and makes clear the separation between mainline requirements + adding those for libhybris on top.

Alternatively, I could also reintegrate this one here in order to have a more narrowed idea of "Am I running on an Ubuntu Touch with libhybris device?" in addition to the property file existing: https://github.com/snapcore/snapd/pull/13674/files#diff-d04e67128a47f4820ffbef091b0bdc701ce59b4de9671b109405f19e91eb629dR193

Set the environment variables required to handle libhybris in a
confined environment, with libhybris bionic linker and EGL_PLATFORM
implementations available through environment variables.

Defaults to wayland for libhybris' libEGL. libgbm support should stay
untouched with these variables setting up the linker paths).

Tested on Halium-enabled Pixel 3a using Halium (Ubuntu Touch 20.04).

Tested Snaps:
- Tide IDE (classic Snap running with libhybris on the host)
- Gnome Mahjongg (with GDK_DEBUG=gl-disable)
- QML Creator (with Wayland QPA)
- Yamagi Quake 2 (with SDL & GLES Renderer)
Telegram Desktop works better with QT_QPA_PLATFORM=xcb, which allows
dropping baggage of HYBRIS_EGLPLATFORM from the environment.

Tested on Halium-enabled Pixel 3a using Halium (Ubuntu Touch 20.04).

Tested Snaps:
- Telegram Desktop (with QT_QPA_PLATFORM=xcb and EGL_PLATFORM=null or empty)
- Tide IDE (classic Snap running with libhybris on the host)
- Gnome Mahjongg (with GDK_DEBUG=gl-disable)
- QML Creator (with Wayland QPA)
- Yamagi Quake 2 (with SDL & GLES Renderer)
As detected by static-checks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs security review Can only be merged once security gave a :+1:
Projects
None yet
5 participants