Skip to content

Commit

Permalink
stackcount: do per-pid statistics only if -P is provided (iovisor#2313)
Browse files Browse the repository at this point in the history
  • Loading branch information
boat0 authored and yonghong-song committed Apr 14, 2019
1 parent 89dc38e commit 2a9436d
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 29 deletions.
13 changes: 11 additions & 2 deletions man/man8/stackcount.8
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@
.SH NAME
stackcount \- Count function calls and their stack traces. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B stackcount [\-h] [\-p PID] [\-c CPU] [\-i INTERVAL] [\-D DURATION] [\-T] [\-r]
[\-s] [\-P] [\-K] [\-U] [\-v] [\-d] [\-f] pattern
.B stackcount [\-h] [\-p PID] [\-c CPU] [\-i INTERVAL] [\-D DURATION] [\-T]
[\-r] [\-s] [\-P] [\-K] [\-U] [\-v] [\-d] [\-f] [\-\-debug] pattern
.SH DESCRIPTION
stackcount traces functions and frequency counts them with their entire
stack trace, kernel stack and user stack, summarized in-kernel for efficiency.
Expand Down Expand Up @@ -31,6 +31,15 @@ wildcards only.
\-s
Show address offsets.
.TP
\-P
Display stacks separately for each process.
.TP
\-K
Show kernel stack only.
.TP
\-U
Show user stack only.
.TP
\-T
Include a timestamp with interval output.
.TP
Expand Down
8 changes: 3 additions & 5 deletions tools/stackcount.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,16 +163,14 @@ def load(self):

trace_count_text = trace_count_text.replace('FILTER', '\n '.join(filter_text))

# We need per-pid statistics when tracing a user-space process, because
# the meaning of the symbols depends on the pid. We also need them if
# per-pid statistics were requested with -P, or for user stacks.
if self.per_pid or not self.is_kernel_probe() or self.user_stack:
# Do per-pid statistics iff -P is provided
if self.per_pid:
trace_count_text = trace_count_text.replace('GET_TGID',
'bpf_get_current_pid_tgid() >> 32')
trace_count_text = trace_count_text.replace('STORE_COMM',
'bpf_get_current_comm(&key.name, sizeof(key.name));')
else:
# kernel stacks only. skip splitting on PID so these aggregate
# skip splitting on PID so these aggregate
# together, and don't store the process name.
trace_count_text = trace_count_text.replace(
'GET_TGID', '0xffffffff')
Expand Down
115 changes: 93 additions & 22 deletions tools/stackcount_example.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,78 @@ block device I/O:

# ./stackcount submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
submit_bh
journal_submit_commit_record.isra.13
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
1

submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__writeback_single_inode
writeback_sb_inodes
__writeback_inodes_wb
2

submit_bio
__block_write_full_page.constprop.39
block_write_full_page
blkdev_writepage
__writepage
write_cache_pages
generic_writepages
do_writepages
__filemap_fdatawrite_range
filemap_fdatawrite
fdatawrite_one_bdev
36

submit_bio
submit_bh
jbd2_journal_commit_transaction
kjournald2
kthread
ret_from_fork
mb_cache_list
38

submit_bio
ext4_writepages
do_writepages
__filemap_fdatawrite_range
filemap_flush
ext4_alloc_da_blocks
ext4_rename
ext4_rename2
vfs_rename
sys_rename
entry_SYSCALL_64_fastpath
79

Detaching...

The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, ext4_rename(), and filemap_flush(): looks like an
application issued file rename has caused back end disk I/O due to ext4
block allocation and a filemap_flush().


Now adding the -P option to display stacks separately for each process:

# ./stackcount -P submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
ext4_writepages
Expand Down Expand Up @@ -64,15 +136,14 @@ Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.

Detaching...

The output shows unique stack traces, in order from leaf (on-CPU) to root,
followed by their occurrence count. The last stack trace in the above output
shows syscall handling, sys_read(), vfs_read(), and then "readahead" functions:
looks like an application issued file read has triggered read ahead. The
application can be seen after the stack trace, in this case, "tar [15069]"
for the "tar" command, PID 15069.
The last stack trace in the above output shows syscall handling, sys_read(),
vfs_read(), and then "readahead" functions: looks like an application issued
file read has triggered read ahead. With "-P", the application can be seen
after the stack trace, in this case, "tar [15069]" for the "tar" command,
PID 15069.

The order of printed stack traces is from least to most frequent. The most
frequent in this case, the ext4_rename() stack, was taken 113 times during
frequent in this case, the ext4_readpages() stack, was taken 113 times during
tracing.

The "[unknown]" frames are from user-level, since this simple workload is
Expand All @@ -82,7 +153,7 @@ walkers. Similar broken stacks will be seen by other profilers and debuggers
that use frame pointers. Hopefully your application preserves them so that
the user-level stack trace is visible. So how does one get frame pointers, if
your application doesn't have them to start with? For the current bcc (until
it supports other stack walkers), you need to be running a application binaries
it supports other stack walkers), you need to be running an application binaries
that preserves frame pointers, eg, using gcc's -fno-omit-frame-pointer. That's
about all I'll say here: this is a big topic that is not bcc/BPF specific.

Expand All @@ -92,7 +163,7 @@ disk IOPS. These could have in-kernel origins (eg, background scrub).

Now adding the -d option to delimit kernel and user stacks:

# ./stackcount -d submit_bio
# ./stackcount -P -d submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.
^C
submit_bio
Expand Down Expand Up @@ -181,7 +252,7 @@ A "--" is printed between the kernel and user stacks.

As a different example, here is the kernel function hrtimer_init_sleeper():

# ./stackcount.py -d hrtimer_init_sleeper
# ./stackcount.py -P -d hrtimer_init_sleeper
Tracing 1 functions for "hrtimer_init_sleeper"... Hit Ctrl-C to end.
^C
hrtimer_init_sleeper
Expand Down Expand Up @@ -294,7 +365,7 @@ JIT symbol translation). dockerd and containerd don't have frame pointers

Here's another kernel function, ip_output():

# ./stackcount.py -d ip_output
# ./stackcount.py -P -d ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
ip_output
Expand Down Expand Up @@ -391,7 +462,7 @@ was the same.

Here is just the user stacks, fetched during the kernel function ip_output():

# ./stackcount.py -U ip_output
# ./stackcount.py -P -U ip_output
Tracing 1 functions for "ip_output"... Hit Ctrl-C to end.
^C
[unknown]
Expand All @@ -416,7 +487,7 @@ User-space functions can also be traced if a library name is provided. For
example, to quickly identify code locations that allocate heap memory for
PID 4902 (using -p), by tracing malloc from libc ("c:malloc"):

# ./stackcount -p 4902 c:malloc
# ./stackcount -P -p 4902 c:malloc
Tracing 1 functions for "malloc"... Hit Ctrl-C to end.
^C
malloc
Expand Down Expand Up @@ -444,12 +515,12 @@ without debuginfo.


In addition to kernel and user-space functions, kernel tracepoints and USDT
tracepoints are also supported.
tracepoints are also supported.

For example, to determine where threads are being created in a particular
For example, to determine where threads are being created in a particular
process, use the pthread_create USDT tracepoint:

# ./stackcount -p $(pidof parprimes) u:pthread:pthread_create
# ./stackcount -P -p $(pidof parprimes) u:pthread:pthread_create
Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
^C

Expand All @@ -463,10 +534,10 @@ Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end.
You can use "readelf -n file" to see if it has USDT tracepoints.


Similarly, to determine where context switching is happening in the kernel,
Similarly, to determine where context switching is happening in the kernel,
use the sched:sched_switch kernel tracepoint:

# ./stackcount t:sched:sched_switch
# ./stackcount -P t:sched:sched_switch
__schedule
schedule
worker_thread
Expand Down Expand Up @@ -518,7 +589,7 @@ use the sched:sched_switch kernel tracepoint:
A -i option can be used to set an output interval, and -T to include a
timestamp. For example:

# ./stackcount.py -Tdi 1 submit_bio
# ./stackcount.py -P -Tdi 1 submit_bio
Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end.

06:05:13
Expand Down Expand Up @@ -705,7 +776,7 @@ did not span block device I/O.
The -s output prints the return instruction offset for each function (aka
symbol offset). Eg:

# ./stackcount.py -s tcp_sendmsg
# ./stackcount.py -P -s tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
tcp_sendmsg+0x1
Expand Down Expand Up @@ -738,7 +809,7 @@ offset can help you locate the lines of code from a disassembly dump.

The -v output is verbose, and shows raw addresses:

./stackcount.py -v tcp_sendmsg
./stackcount.py -P -v tcp_sendmsg
Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end.
^C
ffffffff817b05c1 tcp_sendmsg
Expand Down Expand Up @@ -825,7 +896,7 @@ Use -r to allow regular expressions.
The -f option will emit folded output, which can be used as input to other
tools including flame graphs. For example, with delimiters as well:

# ./stackcount.py -df t:sched:sched_switch
# ./stackcount.py -P -df t:sched:sched_switch
^Csnmp-pass;[unknown];[unknown];[unknown];[unknown];[unknown];-;entry_SYSCALL_64_fastpath;SyS_select;core_sys_select;do_select;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule 1
kworker/7:0;-;ret_from_fork;kthread;worker_thread;schedule;__schedule 1
watchdog/0;-;ret_from_fork;kthread;smpboot_thread_fn;schedule;__schedule 1
Expand Down

0 comments on commit 2a9436d

Please sign in to comment.