-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API for normalizing addresses #114
Add API for normalizing addresses #114
Conversation
The entire proc maps logic is Linux specific and yet nothing except for LinuxMapsEntry contains the Linux prefix. Remove it from that type as well -- it is unnecessary to encode this "information" in the name. Signed-off-by: Daniel Müller <[email protected]>
This change refactors the parsing logic for proc maps slightly, allowing for passing in of a filter function that allows for early discarding of entries that we have no interest in dealing with. Because we move the generalized existing filter functionality into the maps module, we make it easier reusable for future changes (which will require it). On top of that such early filtering shrinks the size of allocations used. Signed-off-by: Daniel Müller <[email protected]>
So far our proc maps parsing logic read in the entire file before allowing evaluation of individual entries (barring preliminary filtering introduced earlier). With this change we convert the functionality over to using a proper lazy iterator. Signed-off-by: Daniel Müller <[email protected]>
The proc maps special file contains a virtual address range that is exclusive of the end address. Currently it is anybody's guess what _end_address represents. Switch to using ops::Range for representing the virtual address range of entries. Signed-off-by: Daniel Müller <[email protected]>
This change makes sure not to leak implementation details through the public API accidentally by restricting the visibility of more types to pub(crate). Signed-off-by: Daniel Müller <[email protected]>
This change introduces some builder pattern inspired logic to the Mmap type. This will later allow us to configure memory mappings a bit more. Right now it's just syntactic sugar. Signed-off-by: Daniel Müller <[email protected]>
With upcoming changes we are going to need to instantiate ElfParser and Archive objects from existing memory mappings, which are meant to be shared with other program entities. In preparation for that, make both types with with Rc<Mmap> instead of a Mmap. Signed-off-by: Daniel Müller <[email protected]>
We are going to add another C source file for the creation of a shared object that we want to work with. Rename the existing test.c to test-exe.c to make it clear that this one is for an executable. Signed-off-by: Daniel Müller <[email protected]>
This change introduces a shared object that we can use for testing purposes. Currently it only contains a single exported dummy function. The binary is added to the existing zip archive, because we will later use it for testing. Signed-off-by: Daniel Müller <[email protected]>
The SymbolType type has a logical default state: that for "unknown" symbols. Implement Default accordingly. Signed-off-by: Daniel Müller <[email protected]>
We don't need shared mappings, because we are not doing any shared memory work. Hence, create memory mappings privately for the time being. Signed-off-by: Daniel Müller <[email protected]>
In Rust "getters" are typically not prefixed with "get". Rename some internal function accordingly. Signed-off-by: Daniel Müller <[email protected]>
We want to support symbolization in scenarios where addresses are captured on one system and then symbolized elsewhere (#61). In a nutshell, we are going to perform address "normalization" on the "local" system (the one that captured the addresses), send them to the remote, and then symbolize these normalized addresses on the remote. This change introduces the logic for normalizing addresses. Basically, we use the corresponding proc maps entry, parse the contents, identify the ELF files to which those addresses map, and then perform normalization based on the ELF information. I introduced a new public module, normalize. We have to think some more about how best to expose this functionality, but this is a reasonable starting point. The normalization does not yet use advanced caching of sorts: each ELF entry handled is parsed as we found it. That is not particularly effective but not wrong. We can optimize further subsequently without adjusting the API. Signed-off-by: Daniel Müller <[email protected]>
As part of the address normalization, users may needs the build ID of the binary in which an address lies along with the binary's path. That's useful in cases where a remote server is queried which indexes binaries by build ID, such as with debuginfod. With this change we make sure to attempt to read the build ID of the ELF binaries that we found and populate the corresponding information inside UserAddrMeta. Signed-off-by: Daniel Müller <[email protected]>
This change adds two tests for the user space address normalization functionality. Signed-off-by: Daniel Müller <[email protected]>
190672a
to
bc9b47d
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #114 +/- ##
==========================================
- Coverage 73.22% 70.35% -2.87%
==========================================
Files 19 23 +4
Lines 3862 4372 +510
==========================================
+ Hits 2828 3076 +248
- Misses 1034 1296 +262
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
bc9b47d
to
9734970
Compare
This change restructures the existing c_api module by introducing a new sub-module, symbolize, and moving the existing functionality into it. Signed-off-by: Daniel Müller <[email protected]>
12c9a92
to
81f3ec2
Compare
81f3ec2
to
2f406ab
Compare
93d629f
to
49d57c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first batch about normalization to give you a chance to look through that
src/normalize/normalize.rs
Outdated
})??; | ||
} | ||
|
||
let meta_idx = if let Some(meta_idx) = meta_lookup.get(&entry.path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just an idea for optimization: most of the time addr will end up in the same Elf file and most probably even the same segment, so it seems beneficial to remember last path+meta+idx and do a quick check. It's probably very beneficial in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Yes, we can optimize more. Will probably do that as a follow up, though.
src/normalize/normalize.rs
Outdated
meta_idx | ||
}; | ||
|
||
let normalized_addr = normalize_elf_addr(addr, &entry)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a private meta cache with also virtaddr -> fileoffset shift, to avoid linear search of matching PT_LOAD segments in ELF. Then at the end we can distill that to user-visible metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will work on optimizing this functionality as follow up.
This change adds C API bindings for the address normalization functionality. Signed-off-by: Daniel Müller <[email protected]>
Currently the C API definitions are intermingled with everything Rust in the documentation. That's causing more confusion than necessary. Let's export the module instead. Doing so has no bearing on C code itself. Signed-off-by: Daniel Müller <[email protected]>
Remove a bunch of unnecessary allocations when looking up addresses for symbols. Signed-off-by: Daniel Müller <[email protected]>
This change overhauls the Pid type slightly, giving it a nicer conversion from u32 and adjusts users to rely on it. Signed-off-by: Daniel Müller <[email protected]>
ce0908e
to
36b6ac4
Compare
Addressed review comments. |
36b6ac4
to
5764032
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
besides the build_id in Unknown (and few nits), lgtm
5764032
to
eaa8cb1
Compare
The end-to-end benchmarks broke because of some API change a while back. Fix them. Unfortunately, it's not easy to just build them in CI, because at least currently, that is bound to the extraction of our large vmlinux-5.17.12-100.fc34.x86_64 image, which we do not want to pull in CI. Signed-off-by: Daniel Müller <[email protected]>
Let's move the [features] section closer to the top of Cargo.toml, where it is easier to spot. After all, it is more relevant to users than, say, build dependencies. Signed-off-by: Daniel Müller <[email protected]>
This change moves a bunch of the symbolization related functionality into the newly introduced symbolize module. This separation will make it easier to conditionally compile parts of the crate in the future. Signed-off-by: Daniel Müller <[email protected]>
With this change we introduce the 'symbolize' feature, which guards all the symbolization related functionality. It is enabled by default. When disabled, mostly the normalization part remains, which can be useful in remote symbolization scenarios, for capturing and normalizing addresses on embedded devices, for example. When this feature is disabled, the regex and lru dependencies are no longer necessary, meaning that all we need is libc. Signed-off-by: Daniel Müller <[email protected]>
Some of our users are very sensitive to the number of dependencies. For that matter we have tried to keep them small. With this change we remove the anyhow build dependency, replacing its usage by directly working with io::Error objects instead. Signed-off-by: Daniel Müller <[email protected]>
We want to support symbolization in scenarios where addresses are captured on one system and then symbolized elsewhere (#61). In a nutshell, we are going to perform address "normalization" on the "local" system (the one that captured the addresses), send them to the remote, and then symbolize these normalized addresses on the remote.
This pull request introduces, among other things, the logic for normalizing addresses. Basically, we use the corresponding proc maps entry, parse the contents, identify the ELF files to which those addresses map, and then perform normalization based on the ELF information.
Please see individual commits for descriptions. Note that included here are all but the last commit from #110, so there is some overlap. I will rebase #110 on top of what is done here (and address the remaining comments).