Skip to content

Commit

Permalink
Update profile.py to use new perf support (iovisor#776)
Browse files Browse the repository at this point in the history
* profile.py to use new perf support

* Minor adjustments to llcstat docs
  • Loading branch information
brendangregg authored and 4ast committed Oct 21, 2016
1 parent be294db commit 715f7e6
Show file tree
Hide file tree
Showing 8 changed files with 1,198 additions and 150 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ Examples:
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[slabratetop](tools/slabratetop.py): Kernel SLAB/SLUB memory cache allocation rate top. [Examples](tools/slabratetop_example.txt).
- tools/[llcstat](tools/llcstat.py): Summarize CPU cache references and misses by process. [Examples](tools/llcstat_example.txt).
- tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt).
- tools/[mysqld_qslower](tools/mysqld_qslower.py): Trace MySQL server queries slower than a threshold. [Examples](tools/mysqld_qslower_example.txt).
- tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt).
Expand Down
8 changes: 4 additions & 4 deletions man/man8/llcstat.8
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
.TH llcstat 8 "2015-08-18" "USER COMMANDS"
.SH NAME
llcstat \- Trace cache references and cache misses. Uses Linux eBPF/bcc.
llcstat \- Summarize CPU cache references and misses by process. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B llcstat [\-h] [\-c SAMPLE_PERIOD] [duration]
.SH DESCRIPTION
llcstat traces cache references and cache misses system-side, and summarizes
them by PID and CPU. These events have different meanings on different
architecture. For x86-64, they mean misses and references to LLC.
llcstat instruments CPU cache references and cache misses system-side, and
summarizes them by PID and CPU. These events have different meanings on
different architecture. For x86-64, they mean misses and references to LLC.
This can be useful to locate and debug performance issues
caused by cache hit rate.

Expand Down
20 changes: 3 additions & 17 deletions man/man8/profile.8
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
profile \- Profile CPU usage by sampling stack traces. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B profile [\-adfh] [\-p PID] [\-U | \-k] [\-F FREQUENCY]
.B [\-\-stack\-storage\-size COUNT] [\-S FRAMES] [duration]
.B [\-\-stack\-storage\-size COUNT] [duration]
.SH DESCRIPTION
This is a CPU profiler. It works by taking samples of stack traces at timed
intervals. It will help you understand and quantify CPU usage: which code is
Expand All @@ -17,17 +17,11 @@ This is also an efficient profiler, as stack traces are frequency counted in
kernel context, rather than passing each stack to user space for frequency
counting there. Only the unique stacks and counts are passed to user space
at the end of the profile, greatly reducing the kernel<->user transfer.

Note: if another perf-based sampling or tracing session is active, the output
may become polluted with their events. This will be fixed for Linux 4.9.
.SH REQUIREMENTS
CONFIG_BPF and bcc.

This also requires Linux 4.6+ (BPF_MAP_TYPE_STACK_TRACE support), and the
perf_misc_flags() function symbol to exist. The latter may or may not
exist depending on your kernel build, and if it doesn't exist, this tool
will not work. Linux 4.9 provides a proper solution to this (this tool will
be updated).
This also requires Linux 4.9+ (BPF_PROG_TYPE_PERF_EVENT support). See tools/old
for an older version that may work on Linux 4.6 - 4.8.
.SH OPTIONS
.TP
\-h
Expand Down Expand Up @@ -57,14 +51,6 @@ Show stacks from kernel space only (no user space stacks).
The maximum number of unique stack traces that the kernel will count (default
2048). If the sampled count exceeds this, a warning will be printed.
.TP
\-S FRAMES
A fixed number of kernel frames to skip. By default, extra registers are
recorded so that the interrupt framework stack can be identified and excluded
from the output. If this isn't working on your architecture, or, if you'd
like to improve performance a tiny amount, then you can specify a fixed count
to skip. Note for debugging that the IP address is printed as the first frame,
followed by the captured stack.
.TP
duration
Duration to trace, in seconds.
.SH EXAMPLES
Expand Down
14 changes: 14 additions & 0 deletions tools/llcstat_example.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Demonstrations of llcstat.


llcstat traces cache reference and cache miss events system-wide, and summarizes
them by PID and CPU.

These events, defined in uapi/linux/perf_event.h, have different meanings on
different architecture. For x86-64, they mean misses and references to LLC.

Expand All @@ -25,6 +27,18 @@ Total References: 518920000 Total Misses: 90265000 Hit Rate: 82.61%

This shows each PID's cache hit rate during the 20 seconds run period.

A count of 5000 was used in this example, which means that one in every 5,000
events will trigger an in-kernel counter to be incremented. This is refactored
on the output, which is why it is always in multiples of 5,000.

We don't instrument every single event since the overhead would be prohibitive,
nor do we need to: this is a type of sampling profiler. Because of this, the
processes that trigger the 5,000'th cache reference or misses can happen to
some degree by chance. Overall it should make sense. But for low counts,
you might find a case where -- by chance -- a process has been tallied with
more misses than references, which would seem impossible.


USAGE message:

# ./llcstat.py --help
Expand Down
Loading

0 comments on commit 715f7e6

Please sign in to comment.