Skip to content

Commit

Permalink
stackcount: Support uprobes, tracepoints, and USDT (iovisor#730)
Browse files Browse the repository at this point in the history
* stackcount: Support user-space functions

Add support for user-space functions in `stackcount` by taking an additional
`-l` command-line parameter specifying the name of the user-space library.
When a user-space library is specified, `stackcount` attaches to a specific
process and traces a user-space function with user-space stacks only.
Regex support for uprobes (similar to what is available for kprobes) is
not currently provided.

Also add a couple of functions to the `BPF` object for consistency.

* bcc: Support regex in attach_uprobe

attach_kprobe allows a regular expression for the function name,
while attach_uprobe does not. Add support in libccc for enumerating
all the function symbols in a binary, and use that in the BPF module
to attach uprobes according to a regular expression. For example:

```python
bpf = BPF(text="...")
bpf.attach_uprobe(name="c", sym_re=".*write$", fn_name="probe")
```

* python: Support regex in attach_tracepoint

Modify attach_tracepoint to take a regex argument, in which case
it enumerates all tracepoints matching that regex and attaches to
all of them. The logic for enumerating tracepoints should eventually
belong in libccc and be shared across all the tools (tplist, trace
and so on).

* cc: Fix termination condition bug in symbol enumeration

bcc_elf would not terminate the enumeration correctly when the
user-provided callback returned -1 but there were still more
sections remaining in the ELF to be enumerated.

* stackcount: Support uprobes and tracepoints

Refactored stackcount and added support for uprobes and tracepoints,
which also required changes to the BPF module. USDT support still
pending.

* bcc: Refactor symbol listing to use foreach-style

Refactor symbol listing from paging style to foreach-style with a
callback function per-symbol. Even though we're now performing a
callback from C to Python for each symbol, this is preferable to the
paging approach because we need all the symbols in the current use
case.

Also refactored `stackcount` slightly; only missing support for USDT
probes now.

* stackcount: Support per-process displays

For user-space functions, or when requested for kernel-space
functions or tracepoints, group the output by process. Toggled
with the -P switch, off by default (except for user-space).

* Fix rebase issues, print pid only when there is one

* stackcount: Add USDT support

Now, stackcount supports USDT tracepoints in addition to
kernel functions, user functions, and kernel tracepoints.
The format is the same as with the other general-purpose
tools (argdist, trace):

```
stackcount -p $(pidof node) u:node:gc*
stackcount -p 185 u:pthread:pthread_create
```

* stackcount: Update examples and man page

Add examples and man page documentation for kernel
tracepoints, USDT tracepoints, and other features.

* stackcount: Change printing format slightly

When -p is specified, don't print the comm and pid. Also,
when -P is specified for kernel probes (kprobes and
tracepoints), use -1 for symbol resolution so that we
don't try to resolve kernel functions as user symbols.
Finally, print the comm and pid at the end of the stack
output and not at the beginning.
  • Loading branch information
goldshtn authored and 4ast committed Oct 5, 2016
1 parent ba404cf commit 07175d0
Show file tree
Hide file tree
Showing 8 changed files with 514 additions and 147 deletions.
32 changes: 25 additions & 7 deletions man/man8/stackcount.8
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
.TH stackcount 8 "2016-01-14" "USER COMMANDS"
.SH NAME
stackcount \- Count kernel function calls and their stack traces. Uses Linux eBPF/bcc.
stackcount \- Count function calls and their stack traces. Uses Linux eBPF/bcc.
.SH SYNOPSIS
.B stackcount [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-r] pattern
.B stackcount [\-h] [\-p PID] [\-i INTERVAL] [\-T] [\-r] [\-s]
[\-P] [\-v] [\-d] pattern
.SH DESCRIPTION
stackcount traces kernel functions and frequency counts them with their entire
kernel stack trace, summarized in-kernel for efficiency. This allows higher
stackcount traces functions and frequency counts them with their entire
stack trace, summarized in-kernel for efficiency. This allows higher
frequency events to be studied. The output consists of unique stack traces,
and their occurrence counts.
and their occurrence counts. In addition to kernel and user functions, kernel
tracepoints and USDT tracepoint are also supported.

The pattern is a string with optional '*' wildcards, similar to file globbing.
If you'd prefer to use regular expressions, use the \-r option.
Expand Down Expand Up @@ -35,14 +37,18 @@ Include a timestamp with interval output.
\-v
Show raw addresses.
.TP
\-d
Print the source of the BPF program when loading it (for debugging purposes).
.TP
\-i interval
Summary interval, in seconds.
.TP
\-p PID
Trace this process ID only (filtered in-kernel).
.TP
.TP
pattern
A kernel function name, or a search pattern. Can include wildcards ("*"). If the
A function name, or a search pattern. Can include wildcards ("*"). If the
\-r option is used, can include regular expressions.
.SH EXAMPLES
.TP
Expand Down Expand Up @@ -77,6 +83,18 @@ Output every 5 seconds, with timestamps:
Only count stacks when PID 185 is on-CPU:
#
.B stackcount -p 185 ip_output
.TP
Count user stacks for dynamic heap allocations with malloc in PID 185:
#
.B stackcount -p 185 c:malloc
.TP
Count user stacks for thread creation (USDT tracepoint) in PID 185:
#
.B stackcount -p 185 u:pthread:pthread_create
.TP
Count kernel stacks for context switch events using a kernel tracepoint:
#
.B stackcount t:sched:sched_switch
.SH OVERHEAD
This summarizes unique stack traces in-kernel for efficiency, allowing it to
trace a higher rate of function calls than methods that post-process in user
Expand All @@ -99,6 +117,6 @@ Linux
.SH STABILITY
Unstable - in development.
.SH AUTHOR
Brendan Gregg
Brendan Gregg, Sasha Goldshtein
.SH SEE ALSO
stacksnoop(8), funccount(8)
12 changes: 8 additions & 4 deletions src/cc/bcc_elf.c
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ static int list_in_scn(Elf *e, Elf_Scn *section, size_t stridx, size_t symsize,
continue;

if (callback(name, sym.st_value, sym.st_size, sym.st_info, payload) < 0)
break;
return 1; // signal termination to caller
}
}

Expand All @@ -184,9 +184,13 @@ static int listsymbols(Elf *e, bcc_elf_symcb callback, void *payload) {
if (header.sh_type != SHT_SYMTAB && header.sh_type != SHT_DYNSYM)
continue;

if (list_in_scn(e, section, header.sh_link, header.sh_entsize, callback,
payload) < 0)
return -1;
int rc = list_in_scn(e, section, header.sh_link, header.sh_entsize,
callback, payload);
if (rc == 1)
break; // callback signaled termination

if (rc < 0)
return rc;
}

return 0;
Expand Down
26 changes: 26 additions & 0 deletions src/cc/bcc_syms.cc
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,32 @@ int bcc_find_symbol_addr(struct bcc_symbol *sym) {
return bcc_elf_foreach_sym(sym->module, _find_sym, sym);
}

struct sym_search_t {
struct bcc_symbol *syms;
int start;
int requested;
int *actual;
};

// see <elf.h>
#define ELF_TYPE_IS_FUNCTION(flags) (((flags) & 0xf) == 2)

static int _list_sym(const char *symname, uint64_t addr, uint64_t end,
int flags, void *payload) {
if (!ELF_TYPE_IS_FUNCTION(flags) || addr == 0)
return 0;

SYM_CB cb = (SYM_CB) payload;
return cb(symname, addr);
}

int bcc_foreach_symbol(const char *module, SYM_CB cb) {
if (module == 0 || cb == 0)
return -1;

return bcc_elf_foreach_sym(module, _list_sym, (void *)cb);
}

int bcc_resolve_symname(const char *module, const char *symname,
const uint64_t addr, struct bcc_symbol *sym) {
uint64_t load_addr;
Expand Down
3 changes: 3 additions & 0 deletions src/cc/bcc_syms.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,16 @@ struct bcc_symbol {
uint64_t offset;
};

typedef int(* SYM_CB)(const char *symname, uint64_t addr);

void *bcc_symcache_new(int pid);
int bcc_symcache_resolve(void *symcache, uint64_t addr, struct bcc_symbol *sym);
int bcc_symcache_resolve_name(void *resolver, const char *name, uint64_t *addr);
void bcc_symcache_refresh(void *resolver);

int bcc_resolve_global_addr(int pid, const char *module, const uint64_t address,
uint64_t *global);
int bcc_foreach_symbol(const char *module, SYM_CB cb);
int bcc_find_symbol_addr(struct bcc_symbol *sym);
int bcc_resolve_symname(const char *module, const char *symname,
const uint64_t addr, struct bcc_symbol *sym);
Expand Down
101 changes: 93 additions & 8 deletions src/python/bcc/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
import sys
basestring = (unicode if sys.version_info[0] < 3 else str)

from .libbcc import lib, _CB_TYPE, bcc_symbol
from .libbcc import lib, _CB_TYPE, bcc_symbol, _SYM_CB_TYPE
from .table import Table
from .perf import Perf
from .usyms import ProcessSymbols
Expand Down Expand Up @@ -531,21 +531,50 @@ def find_library(libname):
res = lib.bcc_procutils_which_so(libname.encode("ascii"))
return res if res is None else res.decode()

def attach_tracepoint(self, tp="", fn_name="", pid=-1, cpu=0, group_fd=-1):
"""attach_tracepoint(tp="", fn_name="", pid=-1, cpu=0, group_fd=-1)
def _get_tracepoints(self, tp_re):
results = []
events_dir = os.path.join(TRACEFS, "events")
for category in os.listdir(events_dir):
cat_dir = os.path.join(events_dir, category)
if not os.path.isdir(cat_dir):
continue
for event in os.listdir(cat_dir):
evt_dir = os.path.join(cat_dir, event)
if os.path.isdir(evt_dir):
tp = ("%s:%s" % (category, event))
if re.match(tp_re, tp):
results.append(tp)
return results

def attach_tracepoint(self, tp="", tp_re="", fn_name="", pid=-1,
cpu=0, group_fd=-1):
"""attach_tracepoint(tp="", tp_re="", fn_name="", pid=-1,
cpu=0, group_fd=-1)
Run the bpf function denoted by fn_name every time the kernel tracepoint
specified by 'tp' is hit. The optional parameters pid, cpu, and group_fd
can be used to filter the probe. The tracepoint specification is simply
the tracepoint category and the tracepoint name, separated by a colon.
For example: sched:sched_switch, syscalls:sys_enter_bind, etc.
Instead of a tracepoint name, a regular expression can be provided in
tp_re. The program will then attach to tracepoints that match the
provided regular expression.
To obtain a list of kernel tracepoints, use the tplist tool or cat the
file /sys/kernel/debug/tracing/available_events.
Example: BPF(text).attach_tracepoint("sched:sched_switch", "on_switch")
Examples:
BPF(text).attach_tracepoint(tp="sched:sched_switch", fn_name="on_switch")
BPF(text).attach_tracepoint(tp_re="sched:.*", fn_name="on_switch")
"""

if tp_re:
for tp in self._get_tracepoints(tp_re):
self.attach_tracepoint(tp=tp, fn_name=fn_name, pid=pid,
cpu=cpu, group_fd=group_fd)
return

fn = self.load_func(fn_name, BPF.TRACEPOINT)
(tp_category, tp_name) = tp.split(':')
res = lib.bpf_attach_tracepoint(fn.fd, tp_category.encode("ascii"),
Expand Down Expand Up @@ -586,16 +615,40 @@ def _del_uprobe(self, name):
del self.open_uprobes[name]
_num_open_probes -= 1

def attach_uprobe(self, name="", sym="", addr=None,
def _get_user_functions(self, name, sym_re):
"""
We are returning addresses here instead of symbol names because it
turns out that the same name may appear multiple times with different
addresses, and the same address may appear multiple times with the same
name. We can't attach a uprobe to the same address more than once, so
it makes sense to return the unique set of addresses that are mapped to
a symbol that matches the provided regular expression.
"""
addresses = []
def sym_cb(sym_name, addr):
if re.match(sym_re, sym_name) and addr not in addresses:
addresses.append(addr)
return 0

res = lib.bcc_foreach_symbol(name, _SYM_CB_TYPE(sym_cb))
if res < 0:
raise Exception("Error %d enumerating symbols in %s" % (res, name))
return addresses

def attach_uprobe(self, name="", sym="", sym_re="", addr=None,
fn_name="", pid=-1, cpu=0, group_fd=-1):
"""attach_uprobe(name="", sym="", addr=None, fn_name=""
"""attach_uprobe(name="", sym="", sym_re="", addr=None, fn_name=""
pid=-1, cpu=0, group_fd=-1)
Run the bpf function denoted by fn_name every time the symbol sym in
the library or binary 'name' is encountered. The real address addr may
be supplied in place of sym. Optional parameters pid, cpu, and group_fd
can be used to filter the probe.
Instead of a symbol name, a regular expression can be provided in
sym_re. The uprobe will then attach to symbols that match the provided
regular expression.
Libraries can be given in the name argument without the lib prefix, or
with the full path (/usr/lib/...). Binaries can be given only with the
full path (/bin/sh).
Expand All @@ -605,6 +658,14 @@ def attach_uprobe(self, name="", sym="", addr=None,
"""

name = str(name)

if sym_re:
for sym_addr in self._get_user_functions(name, sym_re):
self.attach_uprobe(name=name, addr=sym_addr,
fn_name=fn_name, pid=pid, cpu=cpu,
group_fd=group_fd)
return

(path, addr) = BPF._check_path_symbol(name, sym, addr)

self._check_probe_quota(1)
Expand Down Expand Up @@ -798,6 +859,17 @@ def sym(addr, pid):
name, _ = BPF._sym_cache(pid).resolve(addr)
return name

@staticmethod
def symaddr(addr, pid):
"""symaddr(addr, pid)
Translate a memory address into a function name plus the instruction
offset as a hexadecimal number, which is returned as a string.
A pid of less than zero will access the kernel symbol cache.
"""
name, offset = BPF._sym_cache(pid).resolve(addr)
return "%s+0x%x" % (name, offset)

@staticmethod
def ksym(addr):
"""ksym(addr)
Expand All @@ -815,8 +887,7 @@ def ksymaddr(addr):
instruction offset as a hexidecimal number, which is returned as a
string.
"""
name, offset = BPF._sym_cache(-1).resolve(addr)
return "%s+0x%x" % (name, offset)
return BPF.symaddr(addr, -1)

@staticmethod
def ksymname(name):
Expand All @@ -835,6 +906,20 @@ def num_open_kprobes(self):
"""
return len([k for k in self.open_kprobes.keys() if isinstance(k, str)])

def num_open_uprobes(self):
"""num_open_uprobes()
Get the number of open U[ret]probes.
"""
return len(self.open_uprobes)

def num_open_tracepoints(self):
"""num_open_tracepoints()
Get the number of open tracepoints.
"""
return len(self.open_tracepoints)

def kprobe_poll(self, timeout = -1):
"""kprobe_poll(self)
Expand Down
4 changes: 4 additions & 0 deletions src/python/bcc/libbcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ class bcc_symbol(ct.Structure):
lib.bcc_resolve_symname.argtypes = [
ct.c_char_p, ct.c_char_p, ct.c_ulonglong, ct.POINTER(bcc_symbol)]

_SYM_CB_TYPE = ct.CFUNCTYPE(ct.c_int, ct.c_char_p, ct.c_ulonglong)
lib.bcc_foreach_symbol.restype = ct.c_int
lib.bcc_foreach_symbol.argtypes = [ct.c_char_p, _SYM_CB_TYPE]

lib.bcc_symcache_new.restype = ct.c_void_p
lib.bcc_symcache_new.argtypes = [ct.c_int]

Expand Down
Loading

0 comments on commit 07175d0

Please sign in to comment.