Demonstrations of compactstall, the Linux eBPF/bcc version. compactsnoop traces the compact zone system-wide, and print various details. Example output (manual trigger by echo 1 > /proc/sys/vm/compact_memory): # ./compactsnoop COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 3.925 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 81.57 complete zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 4.631 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 113.975 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 80.647 complete zsh 23685 0 ZONE_DMA -1 SYNC 0.020 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 3.367 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 115.18 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 81.766 complete zsh 23685 0 ZONE_DMA -1 SYNC 0.025 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 4.346 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 114.570 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 80.820 complete zsh 23685 0 ZONE_DMA -1 SYNC 0.026 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 4.611 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 113.993 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 80.928 complete zsh 23685 0 ZONE_DMA -1 SYNC 0.02 complete zsh 23685 0 ZONE_DMA32 -1 SYNC 3.889 complete zsh 23685 0 ZONE_NORMAL -1 SYNC 113.776 complete zsh 23685 1 ZONE_NORMAL -1 SYNC 80.727 complete ^C While tracing, the processes alloc pages due to memory fragmentation is too serious to meet contiguous memory requirements in the system, compact zone events happened, which will increase the waiting delay of the processes. compactsnoop can be useful for discovering when compact_stall(/proc/vmstat) continues to increase, whether it is caused by some critical processes or not. The STATUS include (CentOS 7.6's kernel) compact_status = { # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable 0: "skipped", # COMPACT_CONTINUE: compaction should continue to another pageblock 1: "continue", # COMPACT_PARTIAL: direct compaction partially compacted a zone and there are suitable pages 2: "partial", # COMPACT_COMPLETE: The full zone was compacted 3: "complete", } or (kernel 4.7 and above) compact_status = { # COMPACT_NOT_SUITABLE_ZONE: For more detailed tracepoint output - internal to compaction 0: "not_suitable_zone", # COMPACT_SKIPPED: compaction didn't start as it was not possible or direct reclaim was more suitable 1: "skipped", # COMPACT_DEFERRED: compaction didn't start as it was deferred due to past failures 2: "deferred", # COMPACT_NOT_SUITABLE_PAGE: For more detailed tracepoint output - internal to compaction 3: "no_suitable_page", # COMPACT_CONTINUE: compaction should continue to another pageblock 4: "continue", # COMPACT_COMPLETE: The full zone was compacted scanned but wasn't successful to compact suitable pages. 5: "complete", # COMPACT_PARTIAL_SKIPPED: direct compaction has scanned part of the zone but wasn't successful to compact suitable pages. 6: "partial_skipped", # COMPACT_CONTENDED: compaction terminated prematurely due to lock contentions 7: "contended", # COMPACT_SUCCESS: direct compaction terminated after concluding that the allocation should now succeed 8: "success", } The -p option can be used to filter on a PID, which is filtered in-kernel. Here I've used it with -T to print timestamps: # ./compactsnoop -Tp 24376 TIME(s) COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS 101.364115000 zsh 24376 0 ZONE_DMA -1 SYNC 0.025 complete 101.364555000 zsh 24376 0 ZONE_DMA32 -1 SYNC 3.925 complete ^C This shows the zsh process allocs pages, and compact zone events happening, and the delays are not affected much. A maximum tracing duration can be set with the -d option. For example, to trace for 2 seconds: # ./compactsnoop -d 2 COMM PID NODE ZONE ORDER MODE LAT(ms) STATUS zsh 26385 0 ZONE_DMA -1 SYNC 0.025444 complete ^C The -e option prints out extra columns # ./compactsnoop -e COMM PID NODE ZONE ORDER MODE FRAGIDX MIN LOW HIGH FREE LAT(ms) STATUS summ 28276 1 ZONE_NORMAL 3 ASYNC 0.728 11284 14105 16926 14193 3.58 partial summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14479 0.0 complete summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 14785 0.019 complete summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15199 0.006 partial summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 17360 0.030 complete summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 15443 0.024 complete summ 28276 1 ZONE_NORMAL 2 ASYNC -1.000 11284 14105 16926 15634 0.018 complete summ 28276 1 ZONE_NORMAL 3 ASYNC 0.832 11284 14105 16926 15301 0.006 partial summ 28276 0 ZONE_NORMAL 2 ASYNC -1.000 11043 13803 16564 14774 0.005 partial summ 28276 1 ZONE_NORMAL 3 ASYNC 0.733 11284 14105 16926 19888 0.012 partial ^C The FRAGIDX is short for fragmentation index, which only makes sense if an allocation of a requested size would fail. If that is true, the fragmentation index indicates whether external fragmentation or a lack of memory was the problem. The value can be used to determine if page reclaim or compaction should be used. Index is between 0 and 1 so return within 3 decimal places 0 => allocation would fail due to lack of memory 1 => allocation would fail due to fragmentation We can see the whole buddy's fragmentation index from /sys/kernel/debug/extfrag/extfrag_index The MIN/LOW/HIGH shows the watermarks of the zone, which can also get from /proc/zoneinfo, and FREE means nr_free_pages (can be found in /proc/zoneinfo too). The -K option prints out kernel stack # ./compactsnoop -K -e summ 28276 0 ZONE_NORMAL 3 ASYNC 0.528 11043 13803 16564 22654 13.258 partial kretprobe_trampoline+0x0 try_to_compact_pages+0x121 __alloc_pages_direct_compact+0xac __alloc_pages_slowpath+0x3e9 __alloc_pages_nodemask+0x404 alloc_pages_current+0x98 new_slab+0x2c5 ___slab_alloc+0x3ac __slab_alloc+0x40 kmem_cache_alloc_node+0x8b copy_process+0x18e do_fork+0x91 sys_clone+0x16 stub_clone+0x44 summ 28276 1 ZONE_NORMAL 3 ASYNC -1.000 11284 14105 16926 22074 0.008 partial kretprobe_trampoline+0x0 try_to_compact_pages+0x121 __alloc_pages_direct_compact+0xac __alloc_pages_slowpath+0x3e9 __alloc_pages_nodemask+0x404 alloc_pages_current+0x98 new_slab+0x2c5 ___slab_alloc+0x3ac __slab_alloc+0x40 kmem_cache_alloc_node+0x8b copy_process+0x18e do_fork+0x91 sys_clone+0x16 stub_clone+0x44 summ 28276 0 ZONE_NORMAL 3 ASYNC 0.527 11043 13803 16564 25653 9.812 partial kretprobe_trampoline+0x0 try_to_compact_pages+0x121 __alloc_pages_direct_compact+0xac __alloc_pages_slowpath+0x3e9 __alloc_pages_nodemask+0x404 alloc_pages_current+0x98 new_slab+0x2c5 ___slab_alloc+0x3ac __slab_alloc+0x40 kmem_cache_alloc_node+0x8b copy_process+0x18e do_fork+0x91 sys_clone+0x16 stub_clone+0x44 # ./compactsnoop -h usage: compactsnoop.py [-h] [-T] [-p PID] [-d DURATION] [-K] [-e] Trace compact zone optional arguments: -h, --help show this help message and exit -T, --timestamp include timestamp on output -p PID, --pid PID trace this PID only -d DURATION, --duration DURATION total duration of trace in seconds -K, --kernel-stack output kernel stack trace -e, --extended_fields show system memory state examples: ./compactsnoop # trace all compact stall ./compactsnoop -T # include timestamps ./compactsnoop -d 10 # trace for 10 seconds only ./compactsnoop -K # output kernel stack trace ./compactsnoop -e # show extended fields