Demonstrations of biolatency, the Linux eBPF/bcc version. biolatency traces block device I/O (disk I/O), and records the distribution of I/O latency (time), printing this as a histogram when Ctrl-C is hit. For example: # ./biolatency Tracing block device I/O... Hit Ctrl-C to end. ^C usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 12 |******** | 256 -> 511 : 15 |********** | 512 -> 1023 : 43 |******************************* | 1024 -> 2047 : 52 |**************************************| 2048 -> 4095 : 47 |********************************** | 4096 -> 8191 : 52 |**************************************| 8192 -> 16383 : 36 |************************** | 16384 -> 32767 : 15 |********** | 32768 -> 65535 : 2 |* | 65536 -> 131071 : 2 |* | The latency of the disk I/O is measured from the issue to the device to its completion. A -Q option can be used to include time queued in the kernel. This example output shows a large mode of latency from about 128 microseconds to about 32767 microseconds (33 milliseconds). The bulk of the I/O was between 1 and 8 ms, which is the expected block device latency for rotational storage devices. The highest latency seen while tracing was between 65 and 131 milliseconds: the last row printed, for which there were 2 I/O. For efficiency, biolatency uses an in-kernel eBPF map to store timestamps with requests, and another in-kernel map to store the histogram (the "count") column, which is copied to user-space only when output is printed. These methods lower the performance overhead when tracing is performed. In the following example, the -m option is used to print a histogram using milliseconds as the units (which eliminates the first several rows), -T to print timestamps with the output, and to print 1 second summaries 5 times: # ./biolatency -mT 1 5 Tracing block device I/O... Hit Ctrl-C to end. 06:20:16 msecs : count distribution 0 -> 1 : 36 |**************************************| 2 -> 3 : 1 |* | 4 -> 7 : 3 |*** | 8 -> 15 : 17 |***************** | 16 -> 31 : 33 |********************************** | 32 -> 63 : 7 |******* | 64 -> 127 : 6 |****** | 06:20:17 msecs : count distribution 0 -> 1 : 96 |************************************ | 2 -> 3 : 25 |********* | 4 -> 7 : 29 |*********** | 8 -> 15 : 62 |*********************** | 16 -> 31 : 100 |**************************************| 32 -> 63 : 62 |*********************** | 64 -> 127 : 18 |****** | 06:20:18 msecs : count distribution 0 -> 1 : 68 |************************* | 2 -> 3 : 76 |**************************** | 4 -> 7 : 20 |******* | 8 -> 15 : 48 |***************** | 16 -> 31 : 103 |**************************************| 32 -> 63 : 49 |****************** | 64 -> 127 : 17 |****** | 06:20:19 msecs : count distribution 0 -> 1 : 522 |*************************************+| 2 -> 3 : 225 |**************** | 4 -> 7 : 38 |** | 8 -> 15 : 8 | | 16 -> 31 : 1 | | 06:20:20 msecs : count distribution 0 -> 1 : 436 |**************************************| 2 -> 3 : 106 |********* | 4 -> 7 : 34 |** | 8 -> 15 : 19 |* | 16 -> 31 : 1 | | How the I/O latency distribution changes over time can be seen. The -Q option begins measuring I/O latency from when the request was first queued in the kernel, and includes queuing latency: # ./biolatency -Q Tracing block device I/O... Hit Ctrl-C to end. ^C usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 3 |* | 256 -> 511 : 37 |************** | 512 -> 1023 : 30 |*********** | 1024 -> 2047 : 18 |******* | 2048 -> 4095 : 22 |******** | 4096 -> 8191 : 14 |***** | 8192 -> 16383 : 48 |******************* | 16384 -> 32767 : 96 |**************************************| 32768 -> 65535 : 31 |************ | 65536 -> 131071 : 26 |********** | 131072 -> 262143 : 12 |**** | This better reflects the latency suffered by the application (if it is synchronous I/O), whereas the default mode without kernel queueing better reflects the performance of the device. Note that the storage device (and storage device controller) usually have queues of their own, which are always included in the latency, with or without -Q. The -D option will print a histogram per disk. Eg: # ./biolatency -D Tracing block device I/O... Hit Ctrl-C to end. ^C Bucket disk = 'xvdb' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 1 | | 256 -> 511 : 33 |********************** | 512 -> 1023 : 36 |************************ | 1024 -> 2047 : 58 |****************************************| 2048 -> 4095 : 51 |*********************************** | 4096 -> 8191 : 21 |************** | 8192 -> 16383 : 2 |* | Bucket disk = 'xvdc' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 1 | | 256 -> 511 : 38 |*********************** | 512 -> 1023 : 42 |************************* | 1024 -> 2047 : 66 |****************************************| 2048 -> 4095 : 40 |************************ | 4096 -> 8191 : 14 |******** | Bucket disk = 'xvda1' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 18 |********** | 512 -> 1023 : 67 |************************************* | 1024 -> 2047 : 35 |******************* | 2048 -> 4095 : 71 |****************************************| 4096 -> 8191 : 65 |************************************ | 8192 -> 16383 : 65 |************************************ | 16384 -> 32767 : 20 |*********** | 32768 -> 65535 : 7 |*** | This output sows that xvda1 has much higher latency, usually between 0.5 ms and 32 ms, whereas xvdc is usually between 0.2 ms and 4 ms. The -F option prints a separate histogram for each unique set of request flags. For example: ./biolatency.py -Fm Tracing block device I/O... Hit Ctrl-C to end. ^C flags = Read msecs : count distribution 0 -> 1 : 180 |************* | 2 -> 3 : 519 |****************************************| 4 -> 7 : 60 |**** | 8 -> 15 : 123 |********* | 16 -> 31 : 68 |***** | 32 -> 63 : 0 | | 64 -> 127 : 2 | | 128 -> 255 : 12 | | 256 -> 511 : 0 | | 512 -> 1023 : 1 | | flags = Sync-Write msecs : count distribution 0 -> 1 : 5 |****************************************| flags = Flush msecs : count distribution 0 -> 1 : 2 |****************************************| flags = Metadata-Read msecs : count distribution 0 -> 1 : 3 |****************************************| 2 -> 3 : 2 |************************** | 4 -> 7 : 0 | | 8 -> 15 : 1 |************* | 16 -> 31 : 1 |************* | flags = Write msecs : count distribution 0 -> 1 : 103 |******************************* | 2 -> 3 : 106 |******************************** | 4 -> 7 : 130 |****************************************| 8 -> 15 : 79 |************************ | 16 -> 31 : 5 |* | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 1 | | flags = NoMerge-Read msecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 5 |****************************************| 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 1 |******** | flags = NoMerge-Write msecs : count distribution 0 -> 1 : 30 |** | 2 -> 3 : 293 |******************** | 4 -> 7 : 564 |****************************************| 8 -> 15 : 463 |******************************** | 16 -> 31 : 21 |* | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 5 | | flags = Priority-Metadata-Read msecs : count distribution 0 -> 1 : 1 |****************************************| 2 -> 3 : 0 | | 4 -> 7 : 1 |****************************************| 8 -> 15 : 1 |****************************************| flags = ForcedUnitAccess-Metadata-Sync-Write msecs : count distribution 0 -> 1 : 2 |****************************************| flags = ReadAhead-Read msecs : count distribution 0 -> 1 : 15 |*************************** | 2 -> 3 : 22 |****************************************| 4 -> 7 : 14 |************************* | 8 -> 15 : 8 |************** | 16 -> 31 : 1 |* | flags = Priority-Metadata-Write msecs : count distribution 0 -> 1 : 9 |****************************************| These can be handled differently by the storage device, and this mode lets us examine their performance in isolation. The -e option shows extension summary(total, average) For example: # ./biolatency.py -e ^C usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 4 |*********** | 128 -> 255 : 2 |***** | 256 -> 511 : 4 |*********** | 512 -> 1023 : 14 |****************************************| 1024 -> 2047 : 0 | | 2048 -> 4095 : 1 |** | avg = 663 usecs, total: 16575 usecs, count: 25 Sometimes 512 -> 1023 usecs is not enough for throughput tuning. Especially a little difference in performance downgrade. By this extension, we know the value in log2 range is about 663 usecs. The -j option prints a dictionary of the histogram. For example: # ./biolatency.py -j ^C {'ts': '2020-12-30 14:33:03', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 2}, {'interval-start': 64, 'interval-end': 127, 'count': 75}, {'interval-start': 128, 'interval-end': 255, 'count': 7}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 6}, {'interval-start': 1024, 'interval-end': 2047, 'count': 3}, {'interval-start': 2048, 'interval-end': 4095, 'count': 31}]} the key `data` is the list of the log2 histogram intervals. The `interval-start` and `interval-end` define the latency bucket and `count` is the number of I/O's that lie in that latency range. # ./biolatency.py -jF ^C {'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 1}, {'interval-start': 32, 'interval-end': 63, 'count': 1}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 0}, {'interval-start': 1024, 'interval-end': 2047, 'count': 2}], 'flags': 'Sync-Write'} {'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 2}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 2}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}], 'flags': 'Unknown'} {'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 0}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}], 'flags': 'Write'} {'ts': '2020-12-30 14:37:59', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 0}, {'interval-start': 64, 'interval-end': 127, 'count': 0}, {'interval-start': 128, 'interval-end': 255, 'count': 0}, {'interval-start': 256, 'interval-end': 511, 'count': 0}, {'interval-start': 512, 'interval-end': 1023, 'count': 4}], 'flags': 'Flush'} The -j option used with -F prints a histogram dictionary per set of I/O flags. # ./biolatency.py -jD ^C {'ts': '2020-12-30 14:40:00', 'val_type': 'usecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 0}, {'interval-start': 2, 'interval-end': 3, 'count': 0}, {'interval-start': 4, 'interval-end': 7, 'count': 0}, {'interval-start': 8, 'interval-end': 15, 'count': 0}, {'interval-start': 16, 'interval-end': 31, 'count': 0}, {'interval-start': 32, 'interval-end': 63, 'count': 1}, {'interval-start': 64, 'interval-end': 127, 'count': 1}, {'interval-start': 128, 'interval-end': 255, 'count': 1}, {'interval-start': 256, 'interval-end': 511, 'count': 1}, {'interval-start': 512, 'interval-end': 1023, 'count': 6}, {'interval-start': 1024, 'interval-end': 2047, 'count': 1}, {'interval-start': 2048, 'interval-end': 4095, 'count': 3}], 'Bucket ptr': b'sda'} The -j option used with -D prints a histogram dictionary per disk device. # ./biolatency.py -jm ^C {'ts': '2020-12-30 14:42:03', 'val_type': 'msecs', 'data': [{'interval-start': 0, 'interval-end': 1, 'count': 11}, {'interval-start': 2, 'interval-end': 3, 'count': 3}]} The -j with -m prints a millisecond histogram dictionary. The `value_type` key is set to msecs. USAGE message: # ./biolatency -h usage: biolatency.py [-h] [-T] [-Q] [-m] [-D] [-F] [-e] [-j] [-d DISK] [interval] [count] Summarize block device I/O latency as a histogram positional arguments: interval output interval, in seconds count number of outputs optional arguments: -h, --help show this help message and exit -T, --timestamp include timestamp on output -Q, --queued include OS queued time in I/O time -m, --milliseconds millisecond histogram -D, --disks print a histogram per disk device -F, --flags print a histogram per set of I/O flags -e, --extension summarize average/total value -j, --json json output -d DISK, --disk DISK Trace this disk only examples: ./biolatency # summarize block I/O latency as a histogram ./biolatency 1 10 # print 1 second summaries, 10 times ./biolatency -mT 1 # 1s summaries, milliseconds, and timestamps ./biolatency -Q # include OS queued time in I/O time ./biolatency -D # show each disk device separately ./biolatency -F # show I/O flags separately ./biolatency -j # print a dictionary ./biolatency -e # show extension summary(total, average) ./biolatency -d sdc # Trace sdc only