forked from iovisor/bcc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
e548dd5
commit 561a71e
Showing
4 changed files
with
242 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
.TH readahead 8 "2020-08-20" "USER COMMANDS" | ||
.SH NAME | ||
readahead \- Show performance of read-ahead cache | ||
.SH SYNOPSIS | ||
.B readahead [-d DURATION] | ||
.SH DESCRIPTION | ||
The tool shows the performance of read-ahead caching on the system under a given load to investigate any | ||
caching issues. It shows a count of unused pages in the cache and also prints a histogram showing how | ||
long they have remained there. | ||
|
||
This tool traces the \fB__do_page_cache_readahead()\fR kernel function to track entry and exit in the | ||
readahead mechanism in the kernel and then uses \fB__page_cache_alloc()\fR and \fBmark_page_accessed()\fR | ||
functions to calculate the age of the page in the cache as well as see how many are left unaccessed. | ||
|
||
Since this uses BPF, only the root user can use this tool. | ||
.SS NOTE ON KPROBES USAGE | ||
Since the tool uses Kprobes, depending on your linux kernel's compilation, these functions may be inlined | ||
and hence not available for Kprobes. To see whether you have the functions available, check \fBvmlinux\fR | ||
source and binary to confirm whether inlining is happening or not. You can also check \fB/proc/kallsyms\fR | ||
on the host and verify if the target functions are present there before using this. | ||
.SH REQUIREMENTS | ||
CONFIG_BPF, bcc | ||
.SH OPTIONS | ||
\-h | ||
Print usage message | ||
.TP | ||
\-d DURATION | ||
Trace the read-ahead caching system for DURATION seconds | ||
.SH EXAMPLES | ||
.TP | ||
Trace for 30 seconds and show histogram of page age (ms) in read-ahead cache along with unused page count: | ||
# | ||
.B readahead -d 30 | ||
.SH OVERHEAD | ||
The kernel functions instrumented by this program could be high-frequency depending on the profile of the | ||
application (for example sequential IO). We advise the users to measure and monitor the overhead before leaving | ||
this turned on in production environments. | ||
.SH SOURCE | ||
This originated as a bpftrace tool from the book "BPF Performance Tools", | ||
published by Addison Wesley (2019): | ||
.IP | ||
https://www.brendangregg.com/bpf-performance-tools-book.html | ||
.PP | ||
See the book for more documentation on this tool. | ||
.PP | ||
This version is in the BCC repository: | ||
.IP | ||
https://github.com/iovisor/bcc | ||
.PP | ||
Also look in the bcc distribution for a companion _examples.txt file containing | ||
example usage, output, and commentary for this tool. | ||
.SH OS | ||
Linux | ||
.SH STABILITY | ||
Unstable - in development. | ||
.SH AUTHOR | ||
Suchakra Sharma | ||
.SH SEE ALSO | ||
readahead(2), madvise(2) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
#!/usr/bin/python | ||
# @lint-avoid-python-3-compatibility-imports | ||
# | ||
# readahead Show performance of read-ahead cache | ||
# For Linux, uses BCC, eBPF | ||
# | ||
# Copyright (c) 2020 Suchakra Sharma <[email protected]> | ||
# Licensed under the Apache License, Version 2.0 (the "License") | ||
# This was originally created for the BPF Performance Tools book | ||
# published by Addison Wesley. ISBN-13: 9780136554820 | ||
# When copying or porting, include this comment. | ||
# | ||
# 20-Aug-2020 Suchakra Sharma Ported from bpftrace to BCC | ||
|
||
from __future__ import print_function | ||
from bcc import BPF | ||
from time import sleep | ||
import ctypes as ct | ||
import argparse | ||
|
||
# arguments | ||
examples = """examples: | ||
./readahead -d 20 # monitor for 10 seconds and generate stats | ||
""" | ||
|
||
parser = argparse.ArgumentParser( | ||
description="Monitor performance of read ahead cache", | ||
formatter_class=argparse.RawDescriptionHelpFormatter, | ||
epilog=examples) | ||
parser.add_argument("-d", "--duration", type=int, | ||
help="total duration to monitor for, in seconds") | ||
args = parser.parse_args() | ||
if not args.duration: | ||
args.duration = 99999999 | ||
|
||
# BPF program | ||
program = """ | ||
#include <uapi/linux/ptrace.h> | ||
#include <linux/mm_types.h> | ||
BPF_HASH(flag, u32, u8); // used to track if we are in do_page_cache_readahead() | ||
BPF_HASH(birth, struct page*, u64); // used to track timestamps of cache alloc'ed page | ||
BPF_ARRAY(pages); // increment/decrement readahead pages | ||
BPF_HISTOGRAM(dist); | ||
int entry__do_page_cache_readahead(struct pt_regs *ctx) { | ||
u32 pid; | ||
u8 one = 1; | ||
pid = bpf_get_current_pid_tgid(); | ||
flag.update(&pid, &one); | ||
return 0; | ||
} | ||
int exit__do_page_cache_readahead(struct pt_regs *ctx) { | ||
u32 pid; | ||
u8 zero = 0; | ||
pid = bpf_get_current_pid_tgid(); | ||
flag.update(&pid, &zero); | ||
return 0; | ||
} | ||
int exit__page_cache_alloc(struct pt_regs *ctx) { | ||
u32 pid; | ||
u64 ts; | ||
struct page *retval = (struct page*) PT_REGS_RC(ctx); | ||
u32 zero = 0; // static key for accessing pages[0] | ||
pid = bpf_get_current_pid_tgid(); | ||
u8 *f = flag.lookup(&pid); | ||
if (f != NULL && *f == 1) { | ||
ts = bpf_ktime_get_ns(); | ||
birth.update(&retval, &ts); | ||
u64 *count = pages.lookup(&zero); | ||
if (count) (*count)++; // increment read ahead pages count | ||
} | ||
return 0; | ||
} | ||
int entry_mark_page_accessed(struct pt_regs *ctx) { | ||
u64 ts, delta; | ||
struct page *arg0 = (struct page *) PT_REGS_PARM1(ctx); | ||
u32 zero = 0; // static key for accessing pages[0] | ||
u64 *bts = birth.lookup(&arg0); | ||
if (bts != NULL) { | ||
delta = bpf_ktime_get_ns() - *bts; | ||
dist.increment(bpf_log2l(delta/1000000)); | ||
u64 *count = pages.lookup(&zero); | ||
if (count) (*count)--; // decrement read ahead pages count | ||
birth.delete(&arg0); // remove the entry from hashmap | ||
} | ||
return 0; | ||
} | ||
""" | ||
|
||
b = BPF(text=program) | ||
b.attach_kprobe(event="__do_page_cache_readahead", fn_name="entry__do_page_cache_readahead") | ||
b.attach_kretprobe(event="__do_page_cache_readahead", fn_name="exit__do_page_cache_readahead") | ||
b.attach_kretprobe(event="__page_cache_alloc", fn_name="exit__page_cache_alloc") | ||
b.attach_kprobe(event="mark_page_accessed", fn_name="entry_mark_page_accessed") | ||
|
||
# header | ||
print("Tracing... Hit Ctrl-C to end.") | ||
|
||
def print_stats(): | ||
print() | ||
print("Read-ahead unused pages: %d" % (b["pages"][ct.c_ulong(0)].value)) | ||
print("Histogram of read-ahead used page age (ms):") | ||
print("") | ||
b["dist"].print_log2_hist("age (ms)") | ||
b["dist"].clear() | ||
b["pages"].clear() | ||
|
||
while True: | ||
try: | ||
sleep(args.duration) | ||
print_stats() | ||
except KeyboardInterrupt: | ||
print_stats() | ||
break |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
Demonstration of readahead, the Linux eBPF/bcc version | ||
|
||
Read-ahead mechanism is used by operation sytems to optimize sequential operations | ||
by reading ahead some pages to avoid more expensive filesystem operations. This tool | ||
shows the performance of the read-ahead caching on the system under a given load to | ||
investigate any caching issues. It shows a count for unused pages in the cache and | ||
also prints a histogram showing how long they have remianed there. | ||
|
||
Usage Scenario | ||
============== | ||
|
||
Consider that you are developing a React Native application which performs aggressive | ||
reads while re-encoding a video in local-storage. Usually such an app would be multi- | ||
layered and have transitional library dependencies. The actual read may be performed | ||
by some unknown native library which may or may not be using hints to the OS, such as | ||
madvise(p, LEN, MADV_SEQUENTIAL). If high IOPS is observed in such an app, running | ||
readahead may pin the issue much faster in this case as the developer digs deeper | ||
into what may be causing this. | ||
|
||
An example where such an issue can surface is: https://github.com/boltdb/bolt/issues/691 | ||
|
||
# readahead -d 30 | ||
Tracing... Hit Ctrl-C to end. | ||
^C | ||
Read-ahead unused pages: 6765 | ||
Histogram of read-ahead used page age (ms): | ||
|
||
age (ms) : count distribution | ||
0 -> 1 : 4236 |****************************************| | ||
2 -> 3 : 394 |*** | | ||
4 -> 7 : 1670 |*************** | | ||
8 -> 15 : 2132 |******************** | | ||
16 -> 31 : 401 |*** | | ||
32 -> 63 : 1256 |*********** | | ||
64 -> 127 : 2352 |********************** | | ||
128 -> 255 : 357 |*** | | ||
256 -> 511 : 369 |*** | | ||
512 -> 1023 : 366 |*** | | ||
1024 -> 2047 : 181 |* | | ||
2048 -> 4095 : 439 |**** | | ||
4096 -> 8191 : 188 |* | | ||
|
||
In the example above, we recorded system-wide stats for 30 seconds. We can observe that | ||
while most of the pages stayed in the readahead cache for quite less time, after 30 | ||
seconds 6765 pages still remained in the cache, yet unaccessed. | ||
|
||
Note on Kprobes Usage | ||
===================== | ||
|
||
This tool uses Kprobes on the following kernel functions: | ||
|
||
__do_page_cache_readahead() | ||
__page_cache_alloc() | ||
mark_page_accessed() | ||
|
||
Since the tool uses Kprobes, depending on your linux kernel's compilation, these | ||
functions may be inlined and hence not available for Kprobes. To see whether you have | ||
the functions available, check vmlinux source and binary to confirm whether inlining is | ||
happening or not. You can also check /proc/kallsyms on the host and verify if the target | ||
functions are present there before using this tool. |