Demonstrations of nfsdist, the Linux eBPF/bcc version. nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their latency as a power-of-2 histogram. For example: ./nfsdist.py Tracing NFS operation latency... Hit Ctrl-C to end. operation = read usecs : count distribution 0 -> 1 : 4 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 7107 |************** | 16 -> 31 : 19864 |****************************************| 32 -> 63 : 1494 |*** | 64 -> 127 : 491 | | 128 -> 255 : 1810 |*** | 256 -> 511 : 6356 |************ | 512 -> 1023 : 4860 |********* | 1024 -> 2047 : 3070 |****** | 2048 -> 4095 : 1853 |*** | 4096 -> 8191 : 921 |* | 8192 -> 16383 : 122 | | 16384 -> 32767 : 15 | | 32768 -> 65535 : 5 | | 65536 -> 131071 : 2 | | 131072 -> 262143 : 1 | | operation = write usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 1 | | 16 -> 31 : 0 | | 32 -> 63 : 9 | | 64 -> 127 : 19491 |****************************************| 128 -> 255 : 3064 |****** | 256 -> 511 : 940 |* | 512 -> 1023 : 365 | | 1024 -> 2047 : 312 | | 2048 -> 4095 : 119 | | 4096 -> 8191 : 31 | | 8192 -> 16383 : 84 | | 16384 -> 32767 : 31 | | 32768 -> 65535 : 5 | | 65536 -> 131071 : 3 | | 131072 -> 262143 : 0 | | 262144 -> 524287 : 1 | | operation = getattr usecs : count distribution 0 -> 1 : 27 |****************************************| 2 -> 3 : 2 |** | 4 -> 7 : 3 |**** | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 2 |** | 512 -> 1023 : 2 |** | operation = open usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 2 |****************************************| In this example you can see that the read traffic is rather bi-modal, with about 26K reads falling within 8 - 30 usecs and about 18K reads spread between 128 - 8191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket. The faster read traffic is probably coming from a filesystem cache and the slower traffic from disk. The reason why the writes are so consistently fast is because this example test was run on a couple of VM's and I believe the hypervisor was caching all the write traffic to memory. This "latency" is measured from when the operation was issued from the VFS interface to the file system, to when it completed. This spans everything: RPC latency, network latency, file system CPU cycles, file system locks, run queue latency, etc. This is a better measure of the latency suffered by applications reading from a NFS share and can better expose problems experienced by NFS clients. Note that this only traces the common NFS operations (read, write, open and getattr). I chose to include getattr as a significant percentage of NFS traffic end up being getattr calls and are a good indicator of problems with an NFS server. An optional interval and a count can be provided, as well as -m to show the distributions in milliseconds. For example: ./nfsdist -m 1 5 Tracing NFS operation latency... Hit Ctrl-C to end. 11:02:39: operation = write msecs : count distribution 0 -> 1 : 1 | | 2 -> 3 : 24 |******** | 4 -> 7 : 114 |****************************************| 8 -> 15 : 9 |*** | 16 -> 31 : 1 | | 32 -> 63 : 1 | | 11:02:40: operation = write msecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 11 |*** | 4 -> 7 : 111 |****************************************| 8 -> 15 : 13 |**** | 16 -> 31 : 1 | | 11:02:41: operation = write msecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 21 |****** | 4 -> 7 : 137 |****************************************| 8 -> 15 : 3 | | This shows a write workload, with writes hovering primarily in the 4-7ms range. USAGE message: ./nfsdist -h usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count] Summarize NFS operation latency positional arguments: interval output interval, in seconds count number of outputs optional arguments: -h, --help show this help message and exit -T, --notimestamp don't include timestamp on interval output -m, --milliseconds output in milliseconds -p PID, --pid PID trace this PID only examples: ./nfsdist # show operation latency as a histogram ./nfsdist -p 181 # trace PID 181 only ./nfsdist 1 10 # print 1 second summaries, 10 times ./nfsdist -m 5 # 5s summaries, milliseconds