Refactor offcputime stack id error handling #1692

palmtenor · 2018-04-19T19:34:23Z

offcputime.py profiles the system for context switches for a period of time, aggregate result in a hash map and output them aggregated after profiling. This means we will only read the stack trace addresses using stack id in the end of run. However, the tool currently uses BPF_F_REUSE_STACKID, that means if two different stack traces has the same hash signature, they would have same stack id (i.e., bucket number in the stack map) and the content stored in the stack map would be the later stack trace. This would lead to incorrect or misleading result.

This PR changes the tool to not use BPF_F_REUSE_STACKID. Unfortunately this means a lot of stacks would be missing due to hash collision (-EEXIST, current hash function performance is poor, @4ast, @yonghong-song and I are working on it from Kernel side already). Hence, this PR also adds placeholder for these errors to make it clear to users about the existence of those sample / stacks, and just that the tool missed them.

brendangregg · 2018-04-19T21:32:57Z

Thanks, this looks like the right approach.

One small thing to fix, probably for older llvm versions:

bgregg-xenial-bpf-v000@~> ./offcputime.py
/virtual/main.c:17:36: error: expected ';' after top level declarator
BPF_STACK_TRACE(stack_traces, 1024)
                                   ^
                                   ;
1 error generated.

palmtenor · 2018-04-19T22:38:23Z

@brendangregg I tried both LLVM 4.0.1 and 5.0.1 but couldn't repro the issue. I think we changed the semicolon uniformly in #1568. Which LLVM version are you using?

brendangregg · 2018-04-19T22:43:44Z

That was an old system with 3.7.

yonghong-song · 2018-04-19T23:08:17Z

LGTM. Thanks.

palmtenor · 2018-04-20T00:51:57Z

@brendangregg I think you might be using a pre-#1568 runtime but new script, which would make pre-processed code has duplicated semicolon. Can you check?

palmtenor requested review from brendangregg and goldshtn as code owners April 19, 2018 19:34

Refactor offcputime stack id error handling

ea72805

palmtenor force-pushed the stack_flag branch from 44dd00b to ea72805 Compare April 19, 2018 21:14

palmtenor mentioned this pull request Apr 20, 2018

Fix offwaketime PID / TGID handling #1693

Merged

yonghong-song merged commit 7f22495 into iovisor:master Apr 20, 2018

palmtenor deleted the stack_flag branch April 20, 2018 17:59

This was referenced Apr 24, 2018

Refactor offwaketime stack id error handling #1704

Merged

Refactor profile.py stack id error handling #1705

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor offcputime stack id error handling #1692

Refactor offcputime stack id error handling #1692

palmtenor commented Apr 19, 2018 •

edited

Loading

brendangregg commented Apr 19, 2018

palmtenor commented Apr 19, 2018

brendangregg commented Apr 19, 2018

yonghong-song commented Apr 19, 2018

palmtenor commented Apr 20, 2018

Refactor offcputime stack id error handling #1692

Refactor offcputime stack id error handling #1692

Conversation

palmtenor commented Apr 19, 2018 • edited Loading

brendangregg commented Apr 19, 2018

palmtenor commented Apr 19, 2018

brendangregg commented Apr 19, 2018

yonghong-song commented Apr 19, 2018

palmtenor commented Apr 20, 2018

palmtenor commented Apr 19, 2018 •

edited

Loading