Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

alebsys · 2024-04-04T14:58:23Z

According to the discussion in #370:

Updated instrumented probes for operation on kernels < 5.16;
Added commentary for adapting the example to work on 5.16+ kernels;
Removed kprobe instrumentation in favor of kfunc and tracepoint;
Replaced ksym decoder with enum structure and static_map;
Updated label names to be more human readable.

examples/cachestat.bpf.c

bobrik · 2024-04-08T20:19:25Z

With CI running a newer kernel, you need to update this as well:

https://github.com/cloudflare/ebpf_exporter/blob/master/Makefile#L25-L30

When testing locally, I see that cache_writes and page_mark_dirties are identical:

ivan@vm:~$ curl -s https://ip6-localhost:9435/metrics | grep cache_ops
# HELP ebpf_exporter_page_cache_ops_total Page cache operation counters by type
# TYPE ebpf_exporter_page_cache_ops_total counter
ebpf_exporter_page_cache_ops_total{operation="cache_access"} 98
ebpf_exporter_page_cache_ops_total{operation="cache_writes"} 2930
ebpf_exporter_page_cache_ops_total{operation="page_mark_dirties"} 2930

Here are the stack traces for both on v6.9-rc3:

        __traceiter_writeback_dirty_folio+76
        __traceiter_writeback_dirty_folio+76
        __folio_mark_dirty+560
        mark_buffer_dirty+188
        __block_commit_write+192
        block_page_mkwrite+284
        ext4_page_mkwrite+1160
        do_page_mkwrite+100
        do_wp_page+368
        __handle_mm_fault+2808
        handle_mm_fault+364
        do_page_fault+292
        do_mem_abort+80
        el0_da+48
        el0t_64_sync_handler+192
        el0t_64_sync+408

        mark_buffer_dirty+0
        block_page_mkwrite+284
        ext4_page_mkwrite+1160
        do_page_mkwrite+100
        do_wp_page+368
        __handle_mm_fault+2808
        handle_mm_fault+364
        do_page_fault+292
        do_mem_abort+80
        el0_da+48
        el0t_64_sync_handler+192
        el0t_64_sync+408

It looks like both are called unconditionally from block_page_mkwrite:

https://github.com/torvalds/linux/blob/v6.9-rc3/fs/buffer.c#L2623-L2625

I don't think we need two counters for one thing.

alebsys · 2024-04-09T07:38:04Z

@bobrik, interesting behavior!
Locally, both counters have different values for me:

$ uname -r
6.1.0-0.deb11.6-amd64

$ curl -s localhost:9435/metrics -s  | grep page_cache_ops_total
# HELP ebpf_exporter_page_cache_ops_total Page cache operation counters by type
# TYPE ebpf_exporter_page_cache_ops_total counter
ebpf_exporter_page_cache_ops_total{operation="cache_access"} 1.3989073e+07
ebpf_exporter_page_cache_ops_total{operation="cache_writes"} 472184
ebpf_exporter_page_cache_ops_total{operation="page_mark_dirties"} 295110

Stack traces:

$ bpftrace -e 'tracepoint:writeback:writeback_dirty_folio,kprobe:mark_buffer_dirty { @[kstack]=count();}
...
@[
    mark_buffer_dirty+1
    __block_write_begin_int+676
    block_page_mkwrite+227
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 943
@[
    mark_buffer_dirty+1
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 943
@[
    __folio_mark_dirty+504
    __folio_mark_dirty+504
    mark_buffer_dirty+256
    __block_write_begin_int+676
    block_page_mkwrite+227
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 944
@[
    __folio_mark_dirty+504
    __folio_mark_dirty+504
    mark_buffer_dirty+256
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_wp_page+534
    __handle_mm_fault+2524
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 1700
@[
    mark_buffer_dirty+1
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_wp_page+534
    __handle_mm_fault+2524
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 1713

As far as I understand, the main fork occurs in this condition - https://github.com/torvalds/linux/blob/4fe89d07dcc2804c8b562f6c7896a45643d34b2f/fs/buffer.c#L2526

(In the latest version of the kernel, this code was rewritten a little, but again, as far as I can tell, the logic has not changed - torvalds/linux@fe18137)

Therefore, I am not sure that both counters always duplicate each other. What do you think about it?

bobrik · 2024-04-15T23:33:03Z

It's probably that just my vm exhibits this behavior. I don't see it in production.

You should still update this before I can merge:

https://github.com/cloudflare/ebpf_exporter/blob/master/Makefile#L25-L30

alebsys · 2024-04-16T14:12:29Z

@bobrik, Thanks for your comments! I corrected Makefile.

bobrik reviewed Apr 4, 2024

View reviewed changes

examples/cachestat.bpf.c Outdated Show resolved Hide resolved

bobrik reviewed Apr 4, 2024

View reviewed changes

examples/cachestat.bpf.c Outdated Show resolved Hide resolved

Update cachestat examples for kernels < 5.16 and add comments for 5.16+

5af093c

alebsys force-pushed the update-cachestat-example branch from 8d4c1cd to 5af093c Compare April 8, 2024 19:45

Update cachestat examples for kernels < 5.16 and add comments for 5.16+

c7ad595

bobrik approved these changes Apr 16, 2024

View reviewed changes

bobrik merged commit 5cb1bd3 into cloudflare:master Apr 16, 2024
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

alebsys commented Apr 4, 2024

bobrik commented Apr 8, 2024

alebsys commented Apr 9, 2024

bobrik commented Apr 15, 2024

alebsys commented Apr 16, 2024

Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

Conversation

alebsys commented Apr 4, 2024

bobrik commented Apr 8, 2024

alebsys commented Apr 9, 2024

bobrik commented Apr 15, 2024

alebsys commented Apr 16, 2024