-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Put the rr page into a proper ELF library #2753
Conversation
This fixes rr-debugger#2721, though the strategy I proposed there doesn't quite work as stated, so this is slightly different. To recap: We are seeing some test failures under rr, because our runtime system assumes that it can asynchronously unwind from anywhere (basically having things built with -fasynchronous-unwind-tables and proper CFI for assembly code). However, this ran into problems when inside a syscall in the rr page, because the rr page is not part of any image and does not have CFI info, so the unwinder would complain. I originally thought I could just create an ET_EXEC file and ask the dynamic linker to always map the rr page at the correct address (preceeded by one page for the ELF header and eh_frame table), but that turns out to be forbidden. Instead what this does is create an ET_DYN file and then intercept the initial mmap of said file, where the dynamic linker expects that kernel to choose a (potentially randomized) memory location for it. Since we're interposing the kernel, we can just decide to always map this particular mapping at the address we want it to be at. Of course, this is somewhat sensitive to the exact order of events, so while it works well on my current glibc version, I wouldn't be surprised if other libcs (or future versions of glibc) may do things differently enough to not trigger these heuristics. Fortunately, we can just simply keep doing nothing in that case and while we will see the unwind issue again, the recording should otherwise be unaffected. Since we now have a probler .so for the rr_page, the generation process has changed. Rather than generating the raw bytes using python, the rr page is now a simple assembly file that contains the appropriate instructions as well as a link script that ensures the layout is consistent and known. I also combined the record and replay pages into one file, with rr deciding to either map the second or third page of `librrpage.so` as its rr_page. That's not strictly necessary, but it seemed simplest to me. It also occurred to me that we could also combine the exec_stub into this file as well to cut down the number of executable files we need to ship (and have the kernel already map the rr page for us), but that's just a random side note. To test this, I'm asking the C++ unwinder to unwind us out of a system call. Of course, in the real application, the unwinder in question may not necessarily be the C++ system unwinder (since the unwinder is explicitly swappable), but it's an easy way to test the issue. It does assume that whatever libc is in regular use does come with unwind info, which is the case on most distros I know, but we can revisit the issue if we get reports that some distros do not turn on unwind info for their libcs.
Looks great to me! Thanks! |
@Keno This breaks |
Sorry, the |
I assume the missed installation rule is the one ~L660 right below where the installation rule for |
Yep, I think those should go. Sorry - I thought I had deleted them. |
This doesn't work on Ubuntu 18.04. |
Thanks for catching that. I rarely use chaos mode, so I never saw this ... Sorry! |
No worries. |
This fixes #2721, though the strategy I proposed there doesn't
quite work as stated, so this is slightly different.
To recap:
We are seeing some test failures under rr, because our runtime
system assumes that it can asynchronously unwind from anywhere
(basically having things built with -fasynchronous-unwind-tables
and proper CFI for assembly code). However, this ran into problems
when inside a syscall in the rr page, because the rr page is not
part of any image and does not have CFI info, so the unwinder would
complain.
I originally thought I could just create an ET_EXEC file and ask the
dynamic linker to always map the rr page at the correct address
(preceeded by one page for the ELF header and eh_frame table),
but that turns out to be forbidden. Instead what this does is create
an ET_DYN file and then intercept the initial mmap of said file,
where the dynamic linker expects that kernel to choose a (potentially
randomized) memory location for it. Since we're interposing the
kernel, we can just decide to always map this particular mapping
at the address we want it to be at. Of course, this is somewhat
sensitive to the exact order of events, so while it works well
on my current glibc version, I wouldn't be surprised if other libcs
(or future versions of glibc) may do things differently enough to
not trigger these heuristics. Fortunately, we can just simply keep
doing nothing in that case and while we will see the unwind issue again,
the recording should otherwise be unaffected.
Since we now have a probler .so for the rr_page, the generation process
has changed. Rather than generating the raw bytes using python,
the rr page is now a simple assembly file that contains the appropriate
instructions as well as a link script that ensures the layout is
consistent and known. I also combined the record and replay pages
into one file, with rr deciding to either map the second or third page
of
librrpage.so
as its rr_page. That's not strictly necessary, butit seemed simplest to me. It also occurred to me that we could also
combine the exec_stub into this file as well to cut down the number
of executable files we need to ship (and have the kernel already
map the rr page for us), but that's just a random side note.
To test this, I'm asking the C++ unwinder to unwind us out of a system
call. Of course, in the real application, the unwinder in question
may not necessarily be the C++ system unwinder (since the unwinder
is explicitly swappable), but it's an easy way to test the issue.
It does assume that whatever libc is in regular use does come
with unwind info, which is the case on most distros I know,
but we can revisit the issue if we get reports that some
distros do not turn on unwind info for their libcs.
cc @staticfloat