Skip to content

Commit

Permalink
Merge branch 'master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
affansyed committed Sep 13, 2015
2 parents 7c83a34 + 8ddb79d commit d96f6ca
Show file tree
Hide file tree
Showing 27 changed files with 835 additions and 314 deletions.
107 changes: 82 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,86 @@
![BCC Logo](images/logo2.png)
# BPF Compiler Collection (BCC)

This directory contains source code for BCC, a toolkit for creating small
programs that can be dynamically loaded into a Linux kernel.
BCC is a toolkit for creating efficient kernel tracing and manipulation
programs, and includes several useful tools and examples. It makes use of eBPF
(Extended Berkeley Packet Filters), a new feature that was first added to
Linux 3.15. Much of what BCC uses requires Linux 4.1 and above.

eBPF was [described by](https://lkml.org/lkml/2015/4/14/232) Ingo Molnár as:

> One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively.
BCC makes eBPF programs easier to write, with kernel instrumentation in C
and a front-end in Python. It is suited for many tasks, including performance
analysis and network traffic control.

## Screenshot

This example traces a disk I/O kernel function, and populates an in-kernel
power-of-2 histogram of the I/O size. For efficiency, only the histogram
summary is returned to user-level.

```Shell
# ./bitehist.py
Tracing... Hit Ctrl-C to end.
^C
value : count distribution
0 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 211 |********** |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 800 |**************************************|
```

The above output shows a bimodal distribution, where the largest mode of
800 I/O was between 128 and 255 Kbytes in size.

The compiler relies upon eBPF (Extended Berkeley Packet Filters), which is a
feature in Linux kernels starting from 3.15. Currently, this compiler leverages
features which are mostly available in Linux 4.1 and above.
See the source: [bitehist.c](examples/bitehist.c) and
[bitehist.py](examples/bitehist.py). What this traces, what this stores, and how
the data is presented, can be entirely customized. This shows only some of
many possible capabilities.

## Installing

See [INSTALL.md](INSTALL.md) for installation steps on your platform.

## Contents

Some of these are single files that contain both C and Python, others have a
pair of .c and .py files, and some are directories of files.

### Tracing

Examples:

- examples/[bitehist.py](examples/bitehist.py) examples/[bitehist.c](examples/bitehist.c): Block I/O size histogram. [Examples](examples/bitehist_example.txt).
- examples/[disksnoop.py](examples/disksnoop.py) examples/[disksnoop.c](examples/disksnoop.c): Trace block device I/O latency. [Examples](examples/disksnoop_example.txt).
- examples/[hello_world.py](examples/hello_world.py): Prints "Hello, World!" for new processes.
- examples/[trace_fields.py](examples/trace_fields.py): Simple example of printing fields from traced events.
- examples/[vfsreadlat.py](examples/vfsreadlat.py) examples/[vfsreadlat.c](examples/vfsreadlat.c): VFS read latency distribution. [Examples](examples/vfsreadlat_example.txt).

Tools:

- tools/[funccount](tools/funccount): Count kernel function calls. [Examples](tools/funccount_example.txt).
- tools/[pidpersec](tools/pidpersec): Count new processes (via fork). [Examples](tools/pidpersec_example.txt).
- tools/[syncsnoop](tools/syncsnoop): Trace sync() syscall. [Examples](tools/syncsnoop_example.txt).
- tools/[vfscount](tools/vfscount) tools/[vfscount.c](tools/vfscount.c): Count VFS calls. [Examples](tools/vfscount_example.txt).
- tools/[vfsstat](tools/vfsstat) tools/[vfsstat.c](tools/vfsstat.c): Count some VFS calls, with column output. [Examples](tools/vfsstat_example.txt).

### Networking

Examples:

- examples/[distributed_bridge/](examples/distributed_bridge): Distributed bridge example.
- examples/[simple_tc.py](examples/simple_tc.py): Simple traffic control example.
- examples/[simulation.py](examples/simulation.py): Simulation helper.
- examples/[tc_neighbor_sharing.py](examples/tc_neighbor_sharing.py) examples/[tc_neighbor_sharing.c](examples/tc_neighbor_sharing.c): Per-IP classification and rate limiting.
- examples/[tunnel_monitor/](examples/tunnel_monitor): Efficiently monitor traffic flows. [Example video](https://www.youtube.com/watch?v=yYy3Cwce02k).
- examples/[vlan_learning.py](examples/vlan_learning.py) examples/[vlan_learning.c](examples/vlan_learning.c): Demux Ethernet traffic into worker veth+namespaces.

## Motivation

BPF guarantees that the programs loaded into the kernel cannot crash, and
Expand Down Expand Up @@ -46,11 +115,11 @@ The features of this toolkit include:
In the future, more bindings besides python will likely be supported. Feel free
to add support for the language of your choice and send a pull request!

## Examples
## Tutorial

This toolchain is currently composed of two parts: a C wrapper around LLVM, and
a Python API to interact with the running program. Later, we will go into more
detail of how this all works.
The BCC toolchain is currently composed of two parts: a C wrapper around LLVM,
and a Python API to interact with the running program. Later, we will go into
more detail of how this all works.

### Hello, World

Expand All @@ -65,32 +134,20 @@ The BPF program always takes at least one argument, which is a pointer to the
context for this type of program. Different program types have different calling
conventions, but for this one we don't care so `void *` is fine.
```python
prog = """
int hello(void *ctx) {
bpf_trace_printk("Hello, World!\\n");
return 0;
};
"""
b = BPF(text=prog)
BPF(text='void kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); }').trace_print()
```

For this example, we will call the program every time `fork()` is called by a
userspace process. Underneath the hood, fork translates to the `clone` syscall,
so we will attach our program to the kernel symbol `sys_clone`.
```python
b.attach_kprobe(event="sys_clone", fn_name="hello")
```
userspace process. Underneath the hood, fork translates to the `clone` syscall.
BCC recognizes prefix `kprobe__`, and will auto attach our program to the kernel symbol `sys_clone`.

The python process will then print the trace printk circular buffer until ctrl-c
is pressed. The BPF program is removed from the kernel when the userspace
process that loaded it closes the fd (or exits).
```python
b.trace_print()
```

Output:
```
bcc/examples$ sudo python hello_world.py
bcc/examples$ sudo python hello_world.py
python-7282 [002] d... 3757.488508: : Hello, World!
```

Expand Down
2 changes: 1 addition & 1 deletion examples/bitehist.c
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ static unsigned int log2l(unsigned long v)
return log2(v) + 1;
}

int do_request(struct pt_regs *ctx, struct request *req)
int kprobe__blk_start_request(struct pt_regs *ctx, struct request *req)
{
int index = log2l(req->__data_len / 1024);
u64 *leaf = dist.lookup(&index);
Expand Down
84 changes: 5 additions & 79 deletions examples/bitehist.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
#
# Written as a basic example of using a histogram to show a distribution.
#
# USAGE: bitehist.py [interval [count]]
#
# The default interval is 5 seconds. A Ctrl-C will print the partially
# gathered histogram then exit.
#
Expand All @@ -16,90 +14,18 @@
# 15-Aug-2015 Brendan Gregg Created this.

from bcc import BPF
from ctypes import c_ushort, c_int, c_ulonglong
from time import sleep
from sys import argv

def usage():
print("USAGE: %s [interval [count]]" % argv[0])
exit()

# arguments
interval = 5
count = -1
if len(argv) > 1:
try:
interval = int(argv[1])
if interval == 0:
raise
if len(argv) > 2:
count = int(argv[2])
except: # also catches -h, --help
usage()

# load BPF program
b = BPF(src_file = "bitehist.c")
b.attach_kprobe(event="blk_start_request", fn_name="do_request")
dist_max = 64

# header
print("Tracing... Hit Ctrl-C to end.")

# functions
stars_max = 38
def stars(val, val_max, width):
i = 0
text = ""
while (1):
if (i > (width * val / val_max) - 1) or (i > width - 1):
break
text += "*"
i += 1
if val > val_max:
text = text[:-1] + "+"
return text

def print_log2_hist(dist, val_type):
idx_max = -1
val_max = 0
for i in range(1, dist_max + 1):
try:
val = dist[c_int(i)].value
if (val > 0):
idx_max = i
if (val > val_max):
val_max = val
except:
break
if idx_max > 0:
print(" %-15s : count distribution" % val_type);
for i in range(1, idx_max + 1):
low = (1 << i) >> 1
high = (1 << i) - 1
if (low == high):
low -= 1
try:
val = dist[c_int(i)].value
print("%8d -> %-8d : %-8d |%-*s|" % (low, high, val,
stars_max, stars(val, val_max, stars_max)))
except:
break

# output
loop = 0
do_exit = 0
while (1):
if count > 0:
loop += 1
if loop > count:
exit()
try:
sleep(interval)
except KeyboardInterrupt:
pass; do_exit = 1

try:
sleep(99999999)
except KeyboardInterrupt:
print
print_log2_hist(b["dist"], "kbytes")
b["dist"].clear()
if do_exit:
exit()

b["dist"].print_log2_hist()
73 changes: 14 additions & 59 deletions examples/bitehist_example.txt
Original file line number Diff line number Diff line change
@@ -1,70 +1,25 @@
Demonstrations of bitehist.py, the Linux eBPF/bcc version.

This prints a power-of-2 histogram to show the block I/O size distribution.
By default, a summary is printed every five seconds:
A summary is printed after Ctrl-C is hit.

# ./bitehist.py
# ./bitehist.py
Tracing... Hit Ctrl-C to end.

kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 26 |************* |
8 -> 15 : 3 |* |
16 -> 31 : 5 |** |
32 -> 63 : 6 |*** |
64 -> 127 : 7 |*** |
128 -> 255 : 75 |**************************************|

kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 83 |**************************************|
8 -> 15 : 2 | |
16 -> 31 : 6 |** |
32 -> 63 : 6 |** |
64 -> 127 : 5 |** |
128 -> 255 : 21 |********* |
^C
kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 8 |**************************************|

The first output shows a bimodal distribution. The largest mode of 75 I/O were
between 128 and 255 Kbytes in size, and another mode of 26 I/O were between 4
and 7 Kbytes in size.

The next output summary shows the workload is doing more 4 - 7 Kbyte I/O.

The final output is partial, showing what was measured until Ctrl-C was hit.


For an output interval of one second, and three summaries:

# ./bitehist.py 1 3
Tracing... Hit Ctrl-C to end.

kbytes : count distribution
0 -> 1 : 0 | |
value : count distribution
0 -> 1 : 3 | |
2 -> 3 : 0 | |
4 -> 7 : 4 |**************************************|

kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 5 |**************************************|
4 -> 7 : 211 |********** |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 1 |******* |

kbytes : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 4 |**************************************|

32 -> 63 : 0 | |
64 -> 127 : 1 | |
128 -> 255 : 800 |**************************************|

Full usage:
This output shows a bimodal distribution. The largest mod of 800 I/O were
between 128 and 255 Kbytes in size, and another mode of 211 I/O were between
4 and 7 Kbytes in size.

# ./bitehist.py -h
USAGE: ./bitehist.py [interval [count]]
Understanding this distribution is useful for characterizing workloads and
understanding performance. The existance of this distribution is not visible
from averages alone.
27 changes: 7 additions & 20 deletions examples/disksnoop.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,36 +13,23 @@
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>

struct key_t {
struct request *req;
};
BPF_TABLE("hash", struct key_t, u64, start, 10240);

int do_request(struct pt_regs *ctx, struct request *req) {
struct key_t key = {};
u64 ts;
BPF_HASH(start, struct request *);

void kprobe__blk_start_request(struct pt_regs *ctx, struct request *req) {
// stash start timestamp by request ptr
ts = bpf_ktime_get_ns();
key.req = req;
start.update(&key, &ts);
u64 ts = bpf_ktime_get_ns();

return 0;
start.update(&req, &ts);
}

int do_completion(struct pt_regs *ctx, struct request *req) {
struct key_t key = {};
void kprobe__blk_update_request(struct pt_regs *ctx, struct request *req) {
u64 *tsp, delta;

key.req = req;
tsp = start.lookup(&key);

tsp = start.lookup(&req);
if (tsp != 0) {
delta = bpf_ktime_get_ns() - *tsp;
bpf_trace_printk("%d %x %d\n", req->__data_len,
req->cmd_flags, delta / 1000);
start.delete(&key);
start.delete(&req);
}

return 0;
}
2 changes: 0 additions & 2 deletions examples/disksnoop.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@

# load BPF program
b = BPF(src_file="disksnoop.c")
b.attach_kprobe(event="blk_start_request", fn_name="do_request")
b.attach_kprobe(event="blk_update_request", fn_name="do_completion")

# header
print("%-18s %-2s %-7s %8s" % ("TIME(s)", "T", "BYTES", "LAT(ms)"))
Expand Down
Loading

0 comments on commit d96f6ca

Please sign in to comment.