Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor offcputime stack id error handling #1692

Merged
merged 1 commit into from
Apr 20, 2018

Conversation

palmtenor
Copy link
Member

@palmtenor palmtenor commented Apr 19, 2018

offcputime.py profiles the system for context switches for a period of time, aggregate result in a hash map and output them aggregated after profiling. This means we will only read the stack trace addresses using stack id in the end of run. However, the tool currently uses BPF_F_REUSE_STACKID, that means if two different stack traces has the same hash signature, they would have same stack id (i.e., bucket number in the stack map) and the content stored in the stack map would be the later stack trace. This would lead to incorrect or misleading result.

This PR changes the tool to not use BPF_F_REUSE_STACKID. Unfortunately this means a lot of stacks would be missing due to hash collision (-EEXIST, current hash function performance is poor, @4ast, @yonghong-song and I are working on it from Kernel side already). Hence, this PR also adds placeholder for these errors to make it clear to users about the existence of those sample / stacks, and just that the tool missed them.

@brendangregg
Copy link
Member

Thanks, this looks like the right approach.

One small thing to fix, probably for older llvm versions:

bgregg-xenial-bpf-v000@~> ./offcputime.py
/virtual/main.c:17:36: error: expected ';' after top level declarator
BPF_STACK_TRACE(stack_traces, 1024)
                                   ^
                                   ;
1 error generated.

@palmtenor
Copy link
Member Author

@brendangregg I tried both LLVM 4.0.1 and 5.0.1 but couldn't repro the issue. I think we changed the semicolon uniformly in #1568. Which LLVM version are you using?

@brendangregg
Copy link
Member

That was an old system with 3.7.

@yonghong-song
Copy link
Collaborator

LGTM. Thanks.

@palmtenor
Copy link
Member Author

@brendangregg I think you might be using a pre-#1568 runtime but new script, which would make pre-processed code has duplicated semicolon. Can you check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants