Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(complicated) GUI applications running through Rosetta segfault #209242

Open
flokli opened this issue Jan 5, 2023 · 16 comments
Open

(complicated) GUI applications running through Rosetta segfault #209242

flokli opened this issue Jan 5, 2023 · 16 comments
Labels
0.kind: bug 6.topic: darwin Running or building packages on Darwin

Comments

@flokli
Copy link
Contributor

flokli commented Jan 5, 2023

I set up a aarch64-linux graphical NixOS system (nixpkgs master) inside UTM.

Rosetta is enabled, and I can successfully run a x86_64-linux xclock.

Most of the system is already aarch64-linux, but some applications are available for x86_64-linux only (Electron apps mostly).

I created a "forced x86_64-linux overlay" in my overlay.nix:

  pkgsx86_64 = import sources.nixpkgs {
    system = "x86_64-linux";
    config = {
      allowUnfree = true;
    };
    overlays = [];
  };

… and then referred to all x86_64 only applications via pkgsx86_64.$packageName.

Unfortunately, all these applications segfault :-/

❯ spotify
[1]    3205 segmentation fault (core dumped)  spotify

gdb isn't very helpful obviously:

❯ coredumpctl debug
           PID: 3205 (.spotify-wrappe)
           UID: 1000 (flokli)
           GID: 100 (users)
        Signal: 11 (SEGV)
     Timestamp: Thu 2023-01-05 22:45:05 UTC (27s ago)
  Command Line: /run/binfmt/rosetta /nix/store/zi2pql3pizz139b6pqag5glq8c2qd7hb-spotify-1.1.84.716.gc5f8b819/share/spotify/.spotify-wrapped
    Executable: /run/rosetta/rosetta
 Control Group: /user.slice/user-1000.slice/session-7.scope
          Unit: session-7.scope
         Slice: user-1000.slice
       Session: 7
     Owner UID: 1000 (flokli)
       Boot ID: 22594eeeb2624102b7bb2d3490081ccb
    Machine ID: 4bd940c09fc24a90b5be5ebcabd2634c
      Hostname: utm
       Storage: /var/lib/systemd/coredump/core.\x2espotify-wrappe.1000.22594eeeb2624102b7bb2d3490081ccb.3205.1672958705000000.zst (present)
  Size on Disk: 8.2K
       Message: Process 3205 (.spotify-wrappe) of user 1000 dumped core.
                
                Stack trace of thread 3205:
                #0  0x0000800000022800 n/a (n/a + 0x0)
                #1  0x000080000002c914 n/a (n/a + 0x0)
                #2  0x000080000002c914 n/a (n/a + 0x0)
                #3  0x000080000002a248 n/a (n/a + 0x0)
                #4  0x0000800000022070 n/a (n/a + 0x0)
                ELF object binary architecture: AARCH64

GNU gdb (GDB) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http:https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http:https://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /run/rosetta/rosetta...
(No debugging symbols found in /run/rosetta/rosetta)

warning: core file may not match specified executable file.
[New LWP 3205]
Core was generated by `/run/binfmt/rosetta /nix/store/zi2pql3pizz139b6pqag5glq8c2qd7hb-spotify-1.1.84.'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000800000022800 in ?? ()
(gdb) bt
#0  0x0000800000022800 in ?? ()
#1  0x00008000000766bc in ?? ()
#2  0x000000000000020b in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

I'm somewhat suspecting some weird cross-arch graphics driver interactions, but am a bit lost. Anyone got some ideas?

cc @toonn @alyssais @sandydoo

@flokli
Copy link
Contributor Author

flokli commented Jan 5, 2023

Instead of virtualisation.rosetta.enable = true;, I tried boot.binfmt.emulatedSystems = [ "x86_64-linux" ];.

I could get saleae-logic to run, but the others (mostly Electron apps) still segfault.

Chrome itself seems to also be very angry:

❯ google-chrome-stable --no-sandbox
[0105/230925.555828:WARNING:crashpad_client_linux.cc(362)] prctl: Invalid argument (22)
[13183:13183:0105/230926.830157:ERROR:nacl_fork_delegate_linux.cc(313)] Bad NaCl helper startup ack (0 bytes)
/nix/store/r17ihqafckhr6ykz4xjr1wz4nhi338ya-gvfs-1.50.2/lib/gio/modules/libgvfsdbus.so: cannot open shared object file: No such file or directory
Failed to load module: /nix/store/r17ihqafckhr6ykz4xjr1wz4nhi338ya-gvfs-1.50.2/lib/gio/modules/libgvfsdbus.so

(google-chrome:13148): Gtk-WARNING **: 23:09:29.610: Could not load a pixbuf from icon theme.
This may indicate that pixbuf loaders or the mime database could not be found.
[13148:13148:0105/230931.202880:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.431947:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.535923:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.592585:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.643566:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.666660:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.682175:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.902809:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.949431:ERROR:gpu_process_host.cc(984)] GPU process launch failed: error_code=1002
[13148:13148:0105/230931.949532:FATAL:gpu_data_manager_impl_private.cc(440)] GPU process isn't usable. Goodbye.
**
ERROR:../accel/tcg/cpu-exec.c:954:cpu_exec: assertion failed: (cpu == current_cpu)
Bail out! ERROR:../accel/tcg/cpu-exec.c:954:cpu_exec: assertion failed: (cpu == current_cpu)
[13239:13245:0105/230938.368949:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[1]    13148 trace trap (core dumped)  google-chrome-stable --no-sandbox
[13239:13245:0105/230938.374599:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376284:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376514:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.376841:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3
[13239:13245:0105/230938.377012:ERROR:ssl_client_socket_impl.cc(982)] handshake failed; returned -1, SSL error code 1, net_error -3

@flokli
Copy link
Contributor Author

flokli commented Jan 5, 2023

Okay, that crash seems to be a qemu bug: https://gitlab.com/qemu-project/qemu/-/issues/1147

@bouk
Copy link
Contributor

bouk commented Feb 6, 2023

@flokli I found this thread by googling '0x0000800000022800' 😄

I'm getting a very similar stack trace when doing this:

$ nix shell github:oxalica/rust-overlay#packages.x86_64-linux.rust
$ cargo --version
Segmentation fault (core dumped)

$ gdb cargo
(gdb) r
Starting program: /nix/store/qz8gvkxcyiidg4rrrlgif65ca9r8xka9-rust-default-1.67.0/bin/cargo
warning: Selected architecture i386:x86-64 is not compatible with reported target architecture aarch64
warning: Architecture rejected target-supplied description

Program received signal SIGSEGV, Segmentation fault.
0x0000800000022800 in ?? ()
(gdb) b
Breakpoint 1 at 0x800000022800
(gdb) bt
#0  0x0000800000022800 in ?? ()
#1  0x00008000000766bc in ?? ()
#2  0x0000ffffffffd440 in ?? ()
#3  0x3000702d2d720030 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Weirdly this doesn't happen when I do nix run nixpkgs#legacyPackages.x86_64-linux.cargo -- --version

also if I run the program using valgrind using nix shell nixpkgs#legacyPackages.x86_64-linux.valgrind and then valgrind -v cargo, it runs just fine...

I'm also using the rosetta nixos module.

My hypothesis is some sort of impurity that leads to an incorrect binary...

@bouk
Copy link
Contributor

bouk commented Feb 7, 2023

Discovered something interesting:

$ nix build nixpkgs#legacyPackages.x86_64-linux.rust.packages.prebuilt.cargo
$ /run/rosetta/rosetta $(patchelf --print-interpreter result/bin/.cargo-wrapped) result/bin/.cargo-wrapped --version
cargo 1.65.0 (4bc8f24d3 2022-10-20)

$ /run/rosetta/rosetta result/bin/.cargo-wrapped --version
Segmentation fault (core dumped)

It seems rosetta can't handle the interpreter being patched for dynamic libraries. Perhaps it doesn't use the PT_INTERP at all?

We could work around this by changing the binfmt. @flokli can you try the above commands for your programs and see if that resolves things?

@flokli
Copy link
Contributor Author

flokli commented Feb 7, 2023

@bouk what exactly should i try? I don't have a differently linked signal-desktop binary...

@bouk
Copy link
Contributor

bouk commented Feb 7, 2023

Try running this:

nix shell nixpkgs#patchelf # Or try installing patchelf into your systemPackages
$(patchelf --print-interpreter $(which spotify)) spotify

@flokli
Copy link
Contributor Author

flokli commented Feb 7, 2023

Ah, you mean manually invoking the interpreter from the interpreter field... Interesting, I'll try and report back.

@bouk
Copy link
Contributor

bouk commented Feb 7, 2023

Doing some stracing reveals more information:

strace ./cargo2
execve("./cargo2", ["./cargo2"], 0xffffec8f0eb0 /* 45 vars */) = 0
openat(AT_FDCWD, "/proc/self/exe", O_RDONLY) = 4
ioctl(4, _IOC(_IOC_READ, 0x61, 0x22, 0x45), 0xffffe95ee350) = 1
close(4)                                = 0
gettid()                                = 7323
getpid()                                = 7323
openat(AT_FDCWD, "/proc/self/maps", O_RDONLY) = 4
pread64(4, "800000000000-800000022000 r--p 0"..., 4170, 0) = 523
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff988b4000
pread64(4, "", 4170, 523)               = 0
close(4)                                = 0
openat(AT_FDCWD, "/proc/sys/vm/mmap_min_addr", O_RDONLY) = 4
read(4, "4096\n", 1023)                 = 5
close(4)                                = 0
readlinkat(AT_FDCWD, "/proc/self/fd/3", "/home/nix/cargo2", 4095) = 16
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0C\0\0\0\0\0"..., 64) = 64
mmap(NULL, 792, PROT_READ, MAP_PRIVATE, 3, 0) = 0xffff988b3000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffff9a13b000} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

Only the first 792 bytes of the binary are mmaped, while the interp section is moved to the end of the file (running patchelf --debug)

patching ELF file 'cargo2'
replacing section '.interp' with size 28
this is a dynamic library
last page is 0xf85000
first page is 0x0
needed space is 6472
shifting new PT_LOAD segment by 9449472 bytes to work around a Linux kernel bug
rewriting section '.interp' from offset 0x2e0 (size 28) to offset 0x1888000 (size 28)
rewriting section '.note.ABI-tag' from offset 0x2fc (size 32) to offset 0x1888020 (size 32)
rewriting section '.dynsym' from offset 0x320 (size 6408) to offset 0x1888040 (size 6408)
rewriting symbol table section 36
rewriting symbol table section 41
writing cargo2

So it seems that rosetta tries to read .interp and fails because it hasn't memory mapped that section. Notice that 0xffff9a13b000 - 0xffff988b3000 = 0x1888000. This gives us something to work with! I can file a bug with Apple.

@bouk
Copy link
Contributor

bouk commented Feb 8, 2023

I've submitted the following bug report to Apple under FB11984253:

Hello, I'm trying out Rosetta for Linux in NixOS using UTM.app. I'm running into a segmentation fault inside Rosetta when trying to execute a binary that has an .interp section that's not close to the beginning of the binary. To reproduce the exact binary I'm using, please do the following (I've also attached a copy):

  1. Download and unpack https://static.rust-lang.org/dist/rust-1.66.0-x86_64-unknown-linux-gnu.tar.gz
  2. cp rust-1.66.0-x86_64-unknown-linux-gnu/cargo/bin/cargo cargo2
  3. Execute https://github.com/NixOS/patchelf (I'm using version 0.17.2) as follows: patchelf --debug --set-interpreter /lib64/ld-linux-x86-64.so.2 cargo2
  4. rosetta ./cargo2

Here's what I get when I run strace -i ./cargo2 (note the instruction address is in the rosetta program space):

strace -i ./cargo2                                               argo
[0000ffff93ff504c] execve("./cargo2", ["./cargo2"], 0xffffc8c8c658 /* 45 vars */) = 0
[000080000002306c] openat(AT_FDCWD, "/proc/self/exe", O_RDONLY) = 4
[0000800000022e04] ioctl(4, _IOC(_IOC_READ, 0x61, 0x22, 0x45), 0xfffff6244340) = 1
[0000800000022a80] close(4)             = 0
[0000800000022d6c] gettid()             = 8473
[0000800000023580] getpid()             = 8473
[000080000002306c] openat(AT_FDCWD, "/proc/self/maps", O_RDONLY) = 4
[00008000000230f0] pread64(4, "800000000000-800000022000 r--p 0"..., 4170, 0) = 523
[0000800000022f64] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff82348000
[00008000000230f0] pread64(4, "", 4170, 523) = 0
[0000800000022a94] close(4)             = 0
[000080000002306c] openat(AT_FDCWD, "/proc/sys/vm/mmap_min_addr", O_RDONLY) = 4
[00008000000231cc] read(4, "4096\n", 1023) = 5
[0000800000022a94] close(4)             = 0
[00008000000231f8] readlinkat(AT_FDCWD, "/proc/self/fd/3", "/home/nix/cargo2", 4095) = 16
[00008000000231cc] read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0\0C\0\0\0\0\0"..., 64) = 64
[0000800000022f64] mmap(NULL, 792, PROT_READ, MAP_PRIVATE, 3, 0) = 0xffff82347000
[0000800000022878] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xffff83bcf000} ---
[????????????????] +++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

As you can see it segfaults because it tries to access a value 0x1888000 bytes into the binary while only 792 bytes have been mmapped. This makes sense when you look at the debug log of patchelf:

patching ELF file 'cargo2'
replacing section '.interp' with size 28
this is a dynamic library
last page is 0xf85000
first page is 0x0
needed space is 6472
shifting new PT_LOAD segment by 9449472 bytes to work around a Linux kernel bug
rewriting section '.interp' from offset 0x2e0 (size 28) to offset 0x1888000 (size 28)
rewriting section '.note.ABI-tag' from offset 0x2fc (size 32) to offset 0x1888020 (size 32)
rewriting section '.dynsym' from offset 0x320 (size 6408) to offset 0x1888040 (size 6408)
rewriting symbol table section 36
rewriting symbol table section 41
writing cargo2

Running readelf -e cargo2 also provides useful information about the structure of the binary. I've attached its output as cargo2.elf.txt.

This binary was produced using https://github.com/NixOS/patchelf which is a tool that NixOS uses to modify dynamically linked binaries. It moves the .interp section to the back of the binary to safely modify the sections.

Using UTM Version 4.1.5 (74)

Output of /run/rosetta/rosetta:

Usage: rosetta <x86_64 ELF to run>

Optional environment variables:
ROSETTA_DEBUGSERVER_PORT    wait for a debugger connection on given port

version: Rosetta-289.7
uname -a
Linux nixos-builder 5.15.89 #1-NixOS SMP Wed Jan 18 10:48:59 UTC 2023 aarch64 GNU/Linux

Some discussion is also at the following GitHub issue: #209242

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/running-nixos-on-macos-with-rosetta-segfaults/25351/1

@vcunat vcunat added the 6.topic: darwin Running or building packages on Darwin label Feb 13, 2023
@norbertwnuk
Copy link

@bouk - any progress with FB11984253 on Apple side?

@bouk
Copy link
Contributor

bouk commented Jul 5, 2023

Nope, haven't heard anything from Apple.

@zhaofengli
Copy link
Member

I gave it a try and made https://github.com/zhaofengli/rosetta-spice to patch Rosetta to fix the problem, and there is a NixOS module that will configure everything. It hooks sys_mmap to map enough of the binary until PT_INTERP. Hopefully this will all become obsolete soon - I want things to work now so I got my hands dirty 😛

As a bonus, it also allows you to use AOT without needing the host to configure it. This requires either macOS Sonoma or setting virtualisation.rosetta-spice.rosettaPkg to packages.aarch64-linux.rosetta from the flake. However, AOT appears to be buggy at the moment and complex programs either segfault when running or OOM during translation.

WIth AOT enabled:

  • p7zip: Runs
  • geekbench_5: Runs
  • spotify: AOT header specified too many segments
  • saleae-logic: Segfaults
  • geekbench_6: Segfaults
  • chromium: rosettad OOMs during translation

@zhaofengli
Copy link
Member

Looks like the segfault no longer occurs on Sonoma Beta 5 (23A5312d)! If you don't want to upgrade to the beta or want to try AOT, you can use rosetta-spice to get the version (the segfault fix no longer has an effect).

@cor
Copy link
Contributor

cor commented Oct 11, 2023

Can we confirm that this issue is indeed fixed in the released version of Sonoma, and close this issue?

@astr0n8t
Copy link

I just setup a VM running on UTM with rosetta and after installing ida-free it just works via X11 forwarding. Not sure how that affects it but seems to work just fine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug 6.topic: darwin Running or building packages on Darwin
Projects
None yet
Development

No branches or pull requests

8 participants