Skip to content

Commit

Permalink
add slabratetop (iovisor#759)
Browse files Browse the repository at this point in the history
  • Loading branch information
brendangregg authored and 4ast committed Oct 18, 2016
1 parent 4725a72 commit 203b4c9
Show file tree
Hide file tree
Showing 4 changed files with 344 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Examples:
- tools/[gethostlatency](tools/gethostlatency.py): Show latency for getaddrinfo/gethostbyname[2] calls. [Examples](tools/gethostlatency_example.txt).
- tools/[hardirqs](tools/hardirqs.py): Measure hard IRQ (hard interrupt) event time. [Examples](tools/hardirqs_example.txt).
- tools/[killsnoop](tools/killsnoop.py): Trace signals issued by the kill() syscall. [Examples](tools/killsnoop_example.txt).
- tools/[slabratetop](tools/slabratetop.py): Kernel SLAB/SLUB memory cache allocation rate top. [Examples](tools/slabratetop_example.txt).
- tools/[mdflush](tools/mdflush.py): Trace md flush events. [Examples](tools/mdflush_example.txt).
- tools/[mysqld_qslower](tools/mysqld_qslower.py): Trace MySQL server queries slower than a threshold. [Examples](tools/mysqld_qslower_example.txt).
- tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_example.txt).
Expand Down
76 changes: 76 additions & 0 deletions man/man8/slabratetop.8
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
.TH slabratetop 8 "2016-10-17" "USER COMMANDS"
.SH NAME
slabratetop \- Kernel SLAB/SLUB memory cache allocation rate top.
Uses Linux BPF/bcc.
.SH SYNOPSIS
.B slabratetop [\-h] [\-C] [\-r MAXROWS] [interval] [count]
.SH DESCRIPTION
This is top for the the rate of kernel SLAB/SLUB memory allocations.
It works by tracing kmem_cache_alloc() calls, a commonly used interface for
kernel memory allocation (SLAB or SLUB). It summarizes the rate and total bytes
allocated of these calls per interval: the activity. Compare this to
slabtop(1), which shows the current static volume of the caches.

This tool uses kernel dynamic tracing of the kmem_cache_alloc() function.

Since this uses BPF, only the root user can use this tool.
.SH REQUIREMENTS
CONFIG_BPF and bcc.
.SH OPTIONS
.TP
\-C
Don't clear the screen.
.TP
\-r MAXROWS
Maximum number of rows to print. Default is 20.
.TP
interval
Interval between updates, seconds.
.TP
count
Number of interval summaries.
.SH EXAMPLES
.TP
Summarize active kernel SLAB/SLUB calls (kmem_cache_alloc()), showing the top 20 caches every second:
#
.B slabratetop
.TP
Don't clear the screen, and top 8 rows only:
#
.B slabratetop -Cr 8
.TP
5 second summaries, 10 times only:
#
.B slabratetop 5 10
.SH FIELDS
.TP
loadavg:
The contents of /proc/loadavg
.TP
CACHE
Kernel cache name.
.TP
ALLOCS
Allocations (number of calls).
.TP
BYTES
Total bytes allocated.
.SH OVERHEAD
If kmem_cache_alloc() is called at a high rate (eg, >100k/second) the overhead
of this tool might begin to be measurable. The rate can be seen in the ALLOCS
column of the output.
.SH SOURCE
This is from bcc.
.IP
https://github.com/iovisor/bcc
.PP
Also look in the bcc distribution for a companion _examples.txt file containing
example usage, output, and commentary for this tool.
.SH OS
Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
.SH SEE ALSO
slabtop(1)
133 changes: 133 additions & 0 deletions tools/slabratetop.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# slabratetop Summarize kmem_cache_alloc() calls.
# For Linux, uses BCC, eBPF.
#
# USAGE: slabratetop [-h] [-C] [-r MAXROWS] [interval] [count]
#
# This uses in-kernel BPF maps to store cache summaries for efficiency.
#
# SEE ALSO: slabtop(1), which shows the cache volumes.
#
# Copyright 2016 Netflix, Inc.
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 15-Oct-2016 Brendan Gregg Created this.

from __future__ import print_function
from bcc import BPF
from time import sleep, strftime
import argparse
import signal
from subprocess import call

# arguments
examples = """examples:
./slabratetop # kmem_cache_alloc() top, 1 second refresh
./slabratetop -C # don't clear the screen
./slabratetop 5 # 5 second summaries
./slabratetop 5 10 # 5 second summaries, 10 times only
"""
parser = argparse.ArgumentParser(
description="Kernel SLAB/SLUB memory cache allocation rate top",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog=examples)
parser.add_argument("-C", "--noclear", action="store_true",
help="don't clear the screen")
parser.add_argument("-r", "--maxrows", default=20,
help="maximum rows to print, default 20")
parser.add_argument("interval", nargs="?", default=1,
help="output interval, in seconds")
parser.add_argument("count", nargs="?", default=99999999,
help="number of outputs")
args = parser.parse_args()
interval = int(args.interval)
countdown = int(args.count)
maxrows = int(args.maxrows)
clear = not int(args.noclear)
debug = 0

# linux stats
loadavg = "/proc/loadavg"

# signal handler
def signal_ignore(signal, frame):
print()

# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/slub_def.h>
#define CACHE_NAME_SIZE 32
// the key for the output summary
struct info_t {
char name[CACHE_NAME_SIZE];
};
// the value of the output summary
struct val_t {
u64 count;
u64 size;
};
BPF_HASH(counts, struct info_t, struct val_t);
int kprobe__kmem_cache_alloc(struct pt_regs *ctx, struct kmem_cache *cachep)
{
struct info_t info = {};
bpf_probe_read(&info.name, sizeof(info.name), (void *)cachep->name);
struct val_t *valp, zero = {};
valp = counts.lookup_or_init(&info, &zero);
valp->count++;
valp->size += cachep->size;
return 0;
}
"""
if debug:
print(bpf_text)

# initialize BPF
b = BPF(text=bpf_text)

print('Tracing... Output every %d secs. Hit Ctrl-C to end' % interval)

# output
exiting = 0
while 1:
try:
sleep(interval)
except KeyboardInterrupt:
exiting = 1

# header
if clear:
call("clear")
else:
print()
with open(loadavg) as stats:
print("%-8s loadavg: %s" % (strftime("%H:%M:%S"), stats.read()))
print("%-32s %6s %10s" % ("CACHE", "ALLOCS", "BYTES"))

# by-TID output
counts = b.get_table("counts")
line = 0
for k, v in reversed(sorted(counts.items(),
key=lambda counts: counts[1].size)):
print("%-32s %6d %10d" % (k.name, v.count, v.size))

line += 1
if line >= maxrows:
break
counts.clear()

countdown -= 1
if exiting or countdown == 0:
print("Detaching...")
exit()
134 changes: 134 additions & 0 deletions tools/slabratetop_example.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
Demonstrations of slabratetop, the Linux eBPF/bcc version.


slabratetop shows the rate of allocations and total bytes from the kernel
memory allocation caches (SLAB or SLUB), in a top-like display that refreshes.
For example:

# ./slabratetop
<screen clears>
07:01:35 loadavg: 0.38 0.21 0.12 1/342 13297

CACHE ALLOCS BYTES
kmalloc-4096 3554 14557184
kmalloc-256 2382 609536
cred_jar 2568 493056
anon_vma_chain 2007 128448
anon_vma 972 77760
sighand_cache 24 50688
mm_struct 49 50176
RAW 52 49920
proc_inode_cache 59 38232
signal_cache 24 26112
dentry 135 25920
sock_inode_cache 29 18560
files_cache 24 16896
inode_cache 13 7696
TCP 2 3840
pid 24 3072
sigqueue 17 2720
ext4_inode_cache 2 2160
buffer_head 16 1664
xfs_trans 5 1160

By default the screen refreshes every one second, and only the top 20 caches
are shown. These can be tuned with options: see USAGE (-h).

The output above showed that the kmalloc-4096 cache allocated the most, about
14 Mbytes during this interval. This is a generic cache; other caches have
more meaningful names ("dentry", "TCP", "pid", etc).

slabtop(1) is a similar tool that shows the current static volume and usage
of the caches. slabratetop shows the active call rates and total size of the
allocations.


Since "kmalloc-4096" isn't very descriptive, I'm interested in seeing the
kernel stacks that led to this allocation. In the future (maybe by now) the
bcc trace tool could do this. As I'm writing this, it can't, so I'll use my
older ftrace-based kprobe tool as a workarond. This is from my perf-tools
collection: https://github.com/brendangregg/perf-tools.

# ./perf-tools/bin/kprobe -s 'p:kmem_cache_alloc name=+0(+96(%di)):string' 'name == "kmalloc-4096' | head -100
Tracing kprobe kmem_cache_alloc. Ctrl-C to end.
kprobe-3892 [002] d... 7888274.478331: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478333: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
kprobe-3892 [002] d... 7888274.478340: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478341: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
kprobe-3892 [002] d... 7888274.478345: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478346: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
kprobe-3892 [002] d... 7888274.478350: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478351: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
kprobe-3892 [002] d... 7888274.478355: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478355: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
kprobe-3892 [002] d... 7888274.478359: kmem_cache_alloc: (kmem_cache_alloc+0x0/0x1b0) name="kmalloc-4096"
kprobe-3892 [002] d... 7888274.478359: <stack trace>
=> kmem_cache_alloc
=> user_path_at_empty
=> vfs_fstatat
=> SYSC_newstat
=> SyS_newstat
=> entry_SYSCALL_64_fastpath
[...]

This is just an example so that you can see it's possible to dig further.
Please don't copy-n-paste that kprobe command, as it's unlikely to work (the
"+0(+96(%di))" text is specific to a kernel version and architecture).

So these allocations are coming from user_path_at_empty(), which calls other
functions (not seen in the stack: I suspect it's a tail-call compiler
optimization).


USAGE:

# ./slabratetop -h
usage: slabratetop [-h] [-C] [-r MAXROWS] [interval] [count]

Kernel SLAB/SLUB memory cache allocation rate top

positional arguments:
interval output interval, in seconds
count number of outputs

optional arguments:
-h, --help show this help message and exit
-C, --noclear don't clear the screen
-r MAXROWS, --maxrows MAXROWS
maximum rows to print, default 20

examples:
./slabratetop # kmem_cache_alloc() top, 1 second refresh
./slabratetop -C # don't clear the screen
./slabratetop 5 # 5 second summaries
./slabratetop 5 10 # 5 second summaries, 10 times only

0 comments on commit 203b4c9

Please sign in to comment.