Refactor offcputime stack id error handling #1692
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
offcputime.py
profiles the system for context switches for a period of time, aggregate result in a hash map and output them aggregated after profiling. This means we will only read the stack trace addresses using stack id in the end of run. However, the tool currently usesBPF_F_REUSE_STACKID
, that means if two different stack traces has the same hash signature, they would have same stack id (i.e., bucket number in the stack map) and the content stored in the stack map would be the later stack trace. This would lead to incorrect or misleading result.This PR changes the tool to not use
BPF_F_REUSE_STACKID
. Unfortunately this means a lot of stacks would be missing due to hash collision (-EEXIST
, current hash function performance is poor, @4ast, @yonghong-song and I are working on it from Kernel side already). Hence, this PR also adds placeholder for these errors to make it clear to users about the existence of those sample / stacks, and just that the tool missed them.