Demonstrations of stackcount, the Linux eBPF/bcc version. This program traces functions and frequency counts them with their entire stack trace, summarized in-kernel for efficiency. For example, counting stack traces that led to submit_bio(), which creates block device I/O: # ./stackcount submit_bio Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. ^C submit_bio submit_bh journal_submit_commit_record.isra.13 jbd2_journal_commit_transaction kjournald2 kthread ret_from_fork mb_cache_list 1 submit_bio __block_write_full_page.constprop.39 block_write_full_page blkdev_writepage __writepage write_cache_pages generic_writepages do_writepages __writeback_single_inode writeback_sb_inodes __writeback_inodes_wb 2 submit_bio __block_write_full_page.constprop.39 block_write_full_page blkdev_writepage __writepage write_cache_pages generic_writepages do_writepages __filemap_fdatawrite_range filemap_fdatawrite fdatawrite_one_bdev 36 submit_bio submit_bh jbd2_journal_commit_transaction kjournald2 kthread ret_from_fork mb_cache_list 38 submit_bio ext4_writepages do_writepages __filemap_fdatawrite_range filemap_flush ext4_alloc_da_blocks ext4_rename ext4_rename2 vfs_rename sys_rename entry_SYSCALL_64_fastpath 79 Detaching... The output shows unique stack traces, in order from leaf (on-CPU) to root, followed by their occurrence count. The last stack trace in the above output shows syscall handling, ext4_rename(), and filemap_flush(): looks like an application issued file rename has caused back end disk I/O due to ext4 block allocation and a filemap_flush(). I'd have to browse the code to those functions to confirm! The order of printed stack traces is from least to most frequent. The most frequent in this case, the ext4_rename() stack, was taken 79 times during tracing. It can be useful to trace the path to submit_bio to explain unusual rates of disk IOPS. These could have in-kernel origins (eg, background scrub). As another example, here are the code paths that led to ip_output(), which sends a packet at the IP level: # ./stackcount ip_output Tracing 1 functions for "ip_output"... Hit Ctrl-C to end. ^C ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit tcp_tsq_handler.part.32 tcp_tasklet_func tasklet_action __do_softirq do_softirq_own_stack do_softirq __local_bh_enable_ip 1 ip_output ip_queue_xmit tcp_transmit_skb tcp_send_ack tcp_send_delayed_ack __tcp_ack_snd_check tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish ip_local_deliver 1 ip_output ip_queue_xmit tcp_transmit_skb tcp_send_ack tcp_send_delayed_ack __tcp_ack_snd_check tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish ip_local_deliver 1 ip_output ip_queue_xmit tcp_transmit_skb tcp_send_ack tcp_delack_timer_handler tcp_delack_timer call_timer_fn run_timer_softirq __do_softirq irq_exit xen_evtchn_do_upcall 1 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_rcv_established tcp_v4_do_rcv release_sock tcp_sendmsg inet_sendmsg sock_sendmsg 3 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit tcp_tsq_handler.part.32 tcp_tasklet_func tasklet_action __do_softirq run_ksoftirqd smpboot_thread_fn kthread 3 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish ip_local_deliver ip_rcv_finish 4 ip_output ip_queue_xmit tcp_transmit_skb tcp_send_ack tcp_cleanup_rbuf tcp_recvmsg inet_recvmsg sock_recvmsg sock_read_iter __vfs_read vfs_read 5 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_rcv_established tcp_v4_do_rcv release_sock tcp_sendmsg inet_sendmsg sock_sendmsg 9 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_push tcp_sendmsg inet_sendmsg sock_sendmsg sock_write_iter __vfs_write 51 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish ip_local_deliver ip_rcv_finish 171 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit __tcp_push_pending_frames tcp_rcv_established tcp_v4_do_rcv tcp_v4_rcv ip_local_deliver_finish ip_local_deliver ip_rcv_finish 994 ip_output ip_queue_xmit tcp_transmit_skb tcp_write_xmit tcp_tsq_handler.part.32 tcp_tasklet_func tasklet_action __do_softirq irq_exit xen_evtchn_do_upcall xen_do_hypervisor_callback 1002 Detaching... The last two most frequent stack traces are via tcp_transmit_skb(). Note the send last begins with ip_rcv_finish(), for a local receive, which then becomes a transmit: likely pushing more frames when processing the newly received ones. As may be obvious, this is a great tool for quickly understanding kernel code flow. User-space functions can also be traced if a library name is provided. For example, to quickly identify code locations that allocate heap memory: # ./stackcount -l c -p 4902 malloc Tracing 1 functions for "malloc"... Hit Ctrl-C to end. ^C malloc rbtree_new main [unknown] 12 malloc _rbtree_node_new_internal _rbtree_node_insert rbtree_insert main [unknown] 1189 Detaching... Note that user-space uses of stackcount can be somewhat more limited because a lot of user-space libraries and binaries are compiled without debuginfo, or with frame-pointer omission (-fomit-frame-pointer), which makes it impossible to reliably obtain the stack trace. In addition to kernel and user-space functions, kernel tracepoints and USDT tracepoints are also supported. For example, to determine where threads are being created in a particular process, use the pthread_create USDT tracepoint: # ./stackcount -p $(pidof parprimes) u:pthread:pthread_create Tracing 1 functions for "u:pthread:pthread_create"... Hit Ctrl-C to end. ^C parprimes [11923] pthread_create@@GLIBC_2.2.5 main __libc_start_main [unknown] 7 Similarly, to determine where context switching is happening in the kernel, use the sched:sched_switch kernel tracepoint: # ./stackcount t:sched:sched_switch ... (omitted for brevity) __schedule schedule schedule_hrtimeout_range_clock schedule_hrtimeout_range poll_schedule_timeout do_select core_sys_select SyS_select entry_SYSCALL_64_fastpath 40 __schedule schedule schedule_preempt_disabled cpu_startup_entry start_secondary 85 A -i option can be used to set an output interval, and -T to include a timestamp. For example: # ./stackcount -Ti 1 submit_bio Tracing 1 functions for "submit_bio"... Hit Ctrl-C to end. 01:11:37 submit_bio submit_bh journal_submit_commit_record.isra.13 jbd2_journal_commit_transaction kjournald2 kthread ret_from_fork mb_cache_list 1 submit_bio ext4_writepages do_writepages __filemap_fdatawrite_range filemap_flush ext4_alloc_da_blocks ext4_rename ext4_rename2 vfs_rename sys_rename entry_SYSCALL_64_fastpath 20 submit_bio submit_bh jbd2_journal_commit_transaction kjournald2 kthread ret_from_fork mb_cache_list 39 01:11:38 submit_bio ext4_writepages do_writepages __filemap_fdatawrite_range filemap_flush ext4_alloc_da_blocks ext4_rename ext4_rename2 vfs_rename sys_rename entry_SYSCALL_64_fastpath 20 01:11:39 submit_bio ext4_writepages do_writepages __filemap_fdatawrite_range filemap_flush ext4_alloc_da_blocks ext4_rename ext4_rename2 vfs_rename sys_rename entry_SYSCALL_64_fastpath 20 ^C 01:11:39 submit_bio ext4_writepages do_writepages __filemap_fdatawrite_range filemap_flush ext4_alloc_da_blocks ext4_rename ext4_rename2 vfs_rename sys_rename entry_SYSCALL_64_fastpath 20 Detaching... The -s output prints the return instruction offset for each function (aka symbol offset). Eg: # ./stackcount -s tcp_sendmsg Tracing 1 functions for "tcp_sendmsg"... Hit Ctrl-C to end. ^C tcp_sendmsg0x1 sock_sendmsg0x38 sock_write_iter0x78 __vfs_write0xaa vfs_write0xa9 sys_write0x46 entry_SYSCALL_64_fastpath0x16 29 Detaching... If it wasn't clear how one function called another, knowing the instruction offset can help you locate the lines of code from a disassembly dump. A wildcard can also be used. Eg, all functions beginning with "tcp_send": # ./stackcount -s 'tcp_send*' Tracing 14 functions for "tcp_send*"... Hit Ctrl-C to end. ^C tcp_send_delayed_ack0x1 tcp_rcv_established0x3b1 tcp_v4_do_rcv0x130 tcp_v4_rcv0x8e0 ip_local_deliver_finish0x9f ip_local_deliver0x51 ip_rcv_finish0x8a ip_rcv0x29d __netif_receive_skb_core0x637 __netif_receive_skb0x18 netif_receive_skb_internal0x23 1 tcp_send_delayed_ack0x1 tcp_rcv_established0x222 tcp_v4_do_rcv0x130 tcp_v4_rcv0x8e0 ip_local_deliver_finish0x9f ip_local_deliver0x51 ip_rcv_finish0x8a ip_rcv0x29d __netif_receive_skb_core0x637 __netif_receive_skb0x18 netif_receive_skb_internal0x23 4 tcp_send_mss0x1 inet_sendmsg0x67 sock_sendmsg0x38 sock_write_iter0x78 __vfs_write0xaa vfs_write0xa9 sys_write0x46 entry_SYSCALL_64_fastpath0x16 7 tcp_sendmsg0x1 sock_sendmsg0x38 sock_write_iter0x78 __vfs_write0xaa vfs_write0xa9 sys_write0x46 entry_SYSCALL_64_fastpath0x16 7 Detaching... Use -r to allow regular expressions. USAGE message: # ./stackcount -h usage: stackcount [-h] [-p PID] [-i INTERVAL] [-T] [-r] [-s] [-l LIBRARY] [-v] [-d] pattern Count function calls and their stack traces positional arguments: pattern search expression for functions optional arguments: -h, --help show this help message and exit -p PID, --pid PID trace this PID only -i INTERVAL, --interval INTERVAL summary interval, seconds -T, --timestamp include timestamp on output -r, --regexp use regular expressions. Default is "*" wildcards only. -s, --offset show address offsets -l, --library trace user-space functions from this library or executable -v, --verbose show raw addresses -d, --debug print BPF program before starting (for debugging purposes) examples: ./stackcount submit_bio # count kernel stack traces for submit_bio ./stackcount ip_output # count kernel stack traces for ip_output ./stackcount -s ip_output # show symbol offsets ./stackcount -sv ip_output # show offsets and raw addresses (verbose) ./stackcount 'tcp_send*' # count stacks for funcs matching tcp_send* ./stackcount -r '^tcp_send.*' # same as above, using regular expressions ./stackcount -Ti 5 ip_output # output every 5 seconds, with timestamps ./stackcount -p 185 ip_output # count ip_output stacks for PID 185 only ./stackcount -p 185 -l c malloc # count stacks for malloc in PID 185 ./stackcount t:sched:sched_fork # count stacks for the sched_fork tracepoint ./stackcount -p 185 u:node:* # count stacks for all USDT probes in node