Skip to content

Commit

Permalink
profile: exclude CPU idle stacks by default (iovisor#2166)
Browse files Browse the repository at this point in the history
profile: exclude CPU idle stacks by default
  • Loading branch information
brendangregg authored and yonghong-song committed Jan 27, 2019
1 parent 77f4f66 commit 81baae4
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 61 deletions.
3 changes: 3 additions & 0 deletions man/man8/profile.8
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,9 @@ Show stacks from user space only (no kernel space stacks).
\-K
Show stacks from kernel space only (no user space stacks).
.TP
\-I
Include CPU idle stacks (by default these are excluded).
.TP
\-\-stack-storage-size COUNT
The maximum number of unique stack traces that the kernel will count (default
16384). If the sampled count exceeds this, a warning will be printed.
Expand Down
14 changes: 14 additions & 0 deletions tools/profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
# counting there. Only the unique stacks and counts are passed to user space
# at the end of the profile, greatly reducing the kernel<->user transfer.
#
# By default CPU idle stacks are excluded by simply excluding PID 0.
#
# REQUIRES: Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). Under tools/old is
# a version of this tool that may work on Linux 4.6 - 4.8.
#
Expand All @@ -22,6 +24,7 @@
#
# 15-Jul-2016 Brendan Gregg Created this.
# 20-Oct-2016 " " Switched to use the new 4.9 support.
# 26-Jan-2019 " " Changed to exclude CPU idle by default.

from __future__ import print_function
from bcc import BPF, PerfType, PerfSWConfig
Expand Down Expand Up @@ -93,6 +96,8 @@ def stack_id_err(stack_id):
help="insert delimiter between kernel/user stacks")
parser.add_argument("-a", "--annotations", action="store_true",
help="add _[k] annotations to kernel frames")
parser.add_argument("-I", "--include-idle", action="store_true",
help="include CPU idle stacks")
parser.add_argument("-f", "--folded", action="store_true",
help="output folded format, one line per stack (for flame graphs)")
parser.add_argument("--stack-storage-size", default=16384,
Expand Down Expand Up @@ -141,6 +146,9 @@ def stack_id_err(stack_id):
int do_perf_event(struct bpf_perf_event_data *ctx) {
u32 pid = bpf_get_current_pid_tgid() >> 32;
if (IDLE_FILTER)
return 0;
if (!(THREAD_FILTER))
return 0;
Expand Down Expand Up @@ -184,6 +192,12 @@ def stack_id_err(stack_id):
}
"""

# set idle filter
idle_filter = "pid == 0"
if args.include_idle:
idle_filter = "0"
bpf_text = bpf_text.replace('IDLE_FILTER', idle_filter)

# set thread filter
thread_context = ""
perf_filter = "-a"
Expand Down
91 changes: 30 additions & 61 deletions tools/profile_example.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,27 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- func_ab (13549)
5

The output was long; I truncated some lines ("[...]").

This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.

The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).

The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.


By default, CPU idle stacks are excluded. They can be included with -I:

# ./profile -I

[...]

native_safe_halt
Expand All @@ -64,32 +85,16 @@ Sampling at 49 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
- swapper/1 (0)
75

The output was long; I truncated some lines ("[...]").

This default output prints stack traces, followed by a line to describe the
process (a dash, the process name, and a PID in parenthesis), and then an
integer count of how many times this stack trace was sampled.

The output above shows the most frequent stack was from the "swapper/1"
process (PID 0), running the native_safe_halt() function, which was called
by default_idle(), which was called by arch_cpu_idle(), and so on. This is
the idle thread. Stacks can be read top-down, to follow ancestry: child,
parent, grandparent, etc.

The func_ab process is running the func_a() function, called by main(),
called by __libc_start_main(), and called by "[unknown]" with what looks
like a bogus address (1st column). That's evidence of a broken stack trace.
It's common for user-level software that hasn't been compiled with frame
pointers (in this case, libc).

The dd process has called read(), and then enters the kernel via
entry_SYSCALL_64_fastpath(), calling sys_read(), and so on. Yes, I'm now
reading it bottom up. That way follows the code flow.


The dd process is actually "dd if=/dev/zero of=/dev/null": it's a simple
workload to analyze that just moves bytes from /dev/zero to /dev/null.
Profiling just that process:
The dd process profiled ealrier is actually "dd if=/dev/zero of=/dev/null":
it's a simple workload to analyze that just moves bytes from /dev/zero to
/dev/null. Profiling just that process:

# ./profile -p 25036
Sampling at 49 Hertz of PID 25036 by user + kernel stack... Hit Ctrl-C to end.
Expand Down Expand Up @@ -539,6 +544,8 @@ You can increase or decrease the sample frequency. Eg, sampling at 9 Hertz:
# ./profile -F 9
Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.
^C
[...]

func_b
main
__libc_start_main
Expand All @@ -548,27 +555,6 @@ Sampling at 9 Hertz of all threads by user + kernel stack... Hit Ctrl-C to end.

[...]

native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
start_secondary
- swapper/3 (0)
8

native_safe_halt
default_idle
arch_cpu_idle
default_idle_call
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
- swapper/0 (0)
8


You can also restrict profiling to just kernel stacks (-K) or user stacks (-U).
For example, just user stacks:
Expand Down Expand Up @@ -707,24 +693,6 @@ Sampling at 49 Hertz of all threads by user stack... Hit Ctrl-C to end.
- dd (2931)
14

- swapper/7 (0)
46

- swapper/0 (0)
46

- swapper/2 (0)
46

- swapper/1 (0)
46

- swapper/3 (0)
46

- swapper/4 (0)
46


If there are too many unique stack traces for the kernel to save, a warning
will be printed. Eg:
Expand All @@ -739,8 +707,8 @@ Run ./profile -h to see the default.
USAGE message:

# ./profile -h
usage: profile [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-f] [--stack-storage-size STACK_STORAGE_SIZE]
usage: profile.py [-h] [-p PID] [-U | -K] [-F FREQUENCY | -c COUNT] [-d] [-a]
[-I] [-f] [--stack-storage-size STACK_STORAGE_SIZE] [-C CPU]
[duration]

Profile CPU stack traces at a timed interval
Expand All @@ -763,11 +731,12 @@ optional arguments:
sample period, number of events
-d, --delimited insert delimiter between kernel/user stacks
-a, --annotations add _[k] annotations to kernel frames
-I, --include-idle include CPU idle stacks
-f, --folded output folded format, one line per stack (for flame
graphs)
--stack-storage-size STACK_STORAGE_SIZE
the number of unique stack traces that can be stored
and displayed (default 2048)
and displayed (default 16384)
-C CPU, --cpu CPU cpu number to run profile on

examples:
Expand Down

0 comments on commit 81baae4

Please sign in to comment.