Add runqslower tool (iovisor#1728)

* Add runqslower tool * Remove mentions of obsolete enqueue_task_* in tools/runq* * Use u32 for pid field in runqslower
Limnotation · May 9, 2018 · 5c48a3f · 5c48a3f
1 parent ad2d0d9
commit 5c48a3f
Show file tree

Hide file tree

Showing 7 changed files with 405 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 # BPF Compiler Collection (BCC)
 
 BCC is a toolkit for creating efficient kernel tracing and manipulation
-programs, and includes several useful tools and examples. It makes use of 
+programs, and includes several useful tools and examples. It makes use of
 extended BPF (Berkeley Packet Filters), formally known as eBPF, a new feature
 that was first added to Linux 3.15. Much of what BCC uses requires Linux 4.1
 and above.
@@ -23,7 +23,7 @@ power-of-2 histogram of the I/O size. For efficiency, only the histogram
 summary is returned to user-level.
 
 ```Shell
-# ./bitehist.py 
+# ./bitehist.py
 Tracing... Hit Ctrl-C to end.
 ^C
  kbytes : count distribution
@@ -130,6 +130,7 @@ pair of .c and .py files, and some are directories of files.
 - tools/[reset-trace](tools/reset-trace.sh): Reset the state of tracing. Maintenance tool only. [Examples](tools/reset-trace_example.txt).
 - tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt).
 - tools/[runqlen](tools/runqlen.py): Run queue length as a histogram. [Examples](tools/runqlen_example.txt).
+- tools/[runqslower](tools/runqslower.py): Trace long process scheduling delays. [Examples](tools/runqslower_example.txt).
 - tools/[slabratetop](tools/slabratetop.py): Kernel SLAB/SLUB memory cache allocation rate top. [Examples](tools/slabratetop_example.txt).
 - tools/[softirqs](tools/softirqs.py): Measure soft IRQ (soft interrupt) event time. [Examples](tools/softirqs_example.txt).
 - tools/[solisten](tools/solisten.py): Trace TCP socket listen. [Examples](tools/solisten_example.txt).

diff --git a/man/man8/runqlat.8 b/man/man8/runqlat.8
@@ -13,7 +13,8 @@ wait its turn.
 This tool measures two types of run queue latency:
 
 1. The time from a task being enqueued on a run queue to its context switch
-and execution. This traces enqueue_task_*() -> finish_task_switch(),
+and execution. This traces ttwu_do_wakeup(), wake_up_new_task() ->
+finish_task_switch() with either raw tracepoints (if supported) or kprobes
 and instruments the run queue latency after a voluntary context switch.
 
 2. The time from when a task was involuntary context switched and still
@@ -109,4 +110,4 @@ Unstable - in development.
 .SH AUTHOR
 Brendan Gregg
 .SH SEE ALSO
-runqlen(8), pidstat(1)
+runqlen(8), runqslower(8), pidstat(1)
diff --git a/man/man8/runqlen.8 b/man/man8/runqlen.8
@@ -83,4 +83,4 @@ Unstable - in development.
 .SH AUTHOR
 Brendan Gregg
 .SH SEE ALSO
-runqlat(8), pidstat(1)
+runqlat(8), runqslower(8), pidstat(1)
diff --git a/man/man8/runqslower.8 b/man/man8/runqslower.8
@@ -0,0 +1,86 @@
+.TH runqslower 8 "2016-02-07" "USER COMMANDS"
+.SH NAME
+runqlat \- Trace long process scheduling delays.
+.SH SYNOPSIS
+.B runqslower [\-p PID] [min_us]
+.SH DESCRIPTION
+This measures the time a task spends waiting on a run queue (or equivalent
+scheduler data structure) for a turn on-CPU, and shows occurrences of time
+exceeding passed threshold. This time should be small, but a task may need
+to wait its turn due to CPU load. The higher the CPU load, the longer a task
+will generally need to wait its turn.
+
+This tool measures two types of run queue latency:
+
+1. The time from a task being enqueued on a run queue to its context switch
+and execution. This traces ttwu_do_wakeup(), wake_up_new_task() ->
+finish_task_switch() with either raw tracepoints (if supported) or kprobes
+and instruments the run queue latency after a voluntary context switch.
+
+2. The time from when a task was involuntary context switched and still
+in the runnable state, to when it next executed. This is instrumented
+from finish_task_switch() alone.
+
+The overhead of this tool may become significant for some workloads:
+see the OVERHEAD section.
+
+This works by tracing various kernel scheduler functions using dynamic tracing,
+and will need updating to match any changes to these functions.
+
+Since this uses BPF, only the root user can use this tool.
+.SH REQUIREMENTS
+CONFIG_BPF and bcc.
+.SH OPTIONS
+.TP
+\-h
+Print usage message.
+.TP
+\-p PID
+Only show this PID (filtered in kernel for efficiency).
+.TP
+min_us
+Minimum scheduling delay in microseconds to output.
+.SH EXAMPLES
+.TP
+Show scheduling delays longer than 10ms:
+#
+.B runqslower
+.TP
+Show scheduling delays longer than 1ms for process with PID 123:
+#
+.B runqslower -p 123 1000
+.SH FIELDS
+.TP
+TIME
+Time of when scheduling event occurred.
+.TP
+COMM
+Process name.
+.TP
+PID
+Process ID.
+.TP
+LAT(us)
+Scheduling latency from time when task was ready to run to the time it was
+assigned to a CPU to run.
+.SH OVERHEAD
+This traces scheduler functions, which can become very frequent. While eBPF
+has very low overhead, and this tool uses in-kernel maps for efficiency, the
+frequency of scheduler events for some workloads may be high enough that the
+overhead of this tool becomes significant. Measure in a lab environment
+to quantify the overhead before use.
+.SH SOURCE
+This is from bcc.
+.IP
+https://github.com/iovisor/bcc
+.PP
+Also look in the bcc distribution for a companion _examples.txt file containing
+example usage, output, and commentary for this tool.
+.SH OS
+Linux
+.SH STABILITY
+Unstable - in development.
+.SH AUTHOR
+Ivan Babrou
+.SH SEE ALSO
+runqlen(8), runqlat(8), pidstat(1)
diff --git a/tools/runqlat.py b/tools/runqlat.py
@@ -12,7 +12,8 @@
 #
 # This measures two types of run queue latency:
 # 1. The time from a task being enqueued on a run queue to its context switch
-# and execution. This traces enqueue_task_*() -> finish_task_switch(),
+# and execution. This traces ttwu_do_wakeup(), wake_up_new_task() ->
+# finish_task_switch() with either raw tracepoints (if supported) or kprobes
 # and instruments the run queue latency after a voluntary context switch.
 # 2. The time from when a task was involuntary context switched and still
 # in the runnable state, to when it next executed. This is instrumented