Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cachestat examples for kernels < 5.16 and add comments for 5.16+ #372

Merged
merged 2 commits into from
Apr 16, 2024

Conversation

alebsys
Copy link
Contributor

@alebsys alebsys commented Apr 4, 2024

According to the discussion in #370:

  • Updated instrumented probes for operation on kernels < 5.16;
  • Added commentary for adapting the example to work on 5.16+ kernels;
  • Removed kprobe instrumentation in favor of kfunc and tracepoint;
  • Replaced ksym decoder with enum structure and static_map;
  • Updated label names to be more human readable.

examples/cachestat.bpf.c Outdated Show resolved Hide resolved
examples/cachestat.bpf.c Outdated Show resolved Hide resolved
@bobrik
Copy link
Contributor

bobrik commented Apr 8, 2024

With CI running a newer kernel, you need to update this as well:

When testing locally, I see that cache_writes and page_mark_dirties are identical:

ivan@vm:~$ curl -s https://ip6-localhost:9435/metrics | grep cache_ops
# HELP ebpf_exporter_page_cache_ops_total Page cache operation counters by type
# TYPE ebpf_exporter_page_cache_ops_total counter
ebpf_exporter_page_cache_ops_total{operation="cache_access"} 98
ebpf_exporter_page_cache_ops_total{operation="cache_writes"} 2930
ebpf_exporter_page_cache_ops_total{operation="page_mark_dirties"} 2930

Here are the stack traces for both on v6.9-rc3:

        __traceiter_writeback_dirty_folio+76
        __traceiter_writeback_dirty_folio+76
        __folio_mark_dirty+560
        mark_buffer_dirty+188
        __block_commit_write+192
        block_page_mkwrite+284
        ext4_page_mkwrite+1160
        do_page_mkwrite+100
        do_wp_page+368
        __handle_mm_fault+2808
        handle_mm_fault+364
        do_page_fault+292
        do_mem_abort+80
        el0_da+48
        el0t_64_sync_handler+192
        el0t_64_sync+408
        mark_buffer_dirty+0
        block_page_mkwrite+284
        ext4_page_mkwrite+1160
        do_page_mkwrite+100
        do_wp_page+368
        __handle_mm_fault+2808
        handle_mm_fault+364
        do_page_fault+292
        do_mem_abort+80
        el0_da+48
        el0t_64_sync_handler+192
        el0t_64_sync+408

It looks like both are called unconditionally from block_page_mkwrite:

I don't think we need two counters for one thing.

@alebsys
Copy link
Contributor Author

alebsys commented Apr 9, 2024

@bobrik, interesting behavior!
Locally, both counters have different values for me:

$ uname -r
6.1.0-0.deb11.6-amd64

$ curl -s localhost:9435/metrics -s  | grep page_cache_ops_total
# HELP ebpf_exporter_page_cache_ops_total Page cache operation counters by type
# TYPE ebpf_exporter_page_cache_ops_total counter
ebpf_exporter_page_cache_ops_total{operation="cache_access"} 1.3989073e+07
ebpf_exporter_page_cache_ops_total{operation="cache_writes"} 472184
ebpf_exporter_page_cache_ops_total{operation="page_mark_dirties"} 295110

Stack traces:

$ bpftrace -e 'tracepoint:writeback:writeback_dirty_folio,kprobe:mark_buffer_dirty { @[kstack]=count();}
...
@[
    mark_buffer_dirty+1
    __block_write_begin_int+676
    block_page_mkwrite+227
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 943
@[
    mark_buffer_dirty+1
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 943
@[
    __folio_mark_dirty+504
    __folio_mark_dirty+504
    mark_buffer_dirty+256
    __block_write_begin_int+676
    block_page_mkwrite+227
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_fault+700
    __handle_mm_fault+1649
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 944
@[
    __folio_mark_dirty+504
    __folio_mark_dirty+504
    mark_buffer_dirty+256
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_wp_page+534
    __handle_mm_fault+2524
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 1700
@[
    mark_buffer_dirty+1
    __block_commit_write.constprop.0.isra.0+85
    block_page_mkwrite+345
    ext4_page_mkwrite+650
    do_page_mkwrite+79
    do_wp_page+534
    __handle_mm_fault+2524
    handle_mm_fault+221
    do_user_addr_fault+460
    exc_page_fault+113
    asm_exc_page_fault+34
]: 1713

As far as I understand, the main fork occurs in this condition - https://github.com/torvalds/linux/blob/4fe89d07dcc2804c8b562f6c7896a45643d34b2f/fs/buffer.c#L2526

(In the latest version of the kernel, this code was rewritten a little, but again, as far as I can tell, the logic has not changed - torvalds/linux@fe18137)

Therefore, I am not sure that both counters always duplicate each other. What do you think about it?

@bobrik
Copy link
Contributor

bobrik commented Apr 15, 2024

It's probably that just my vm exhibits this behavior. I don't see it in production.

You should still update this before I can merge:

@alebsys
Copy link
Contributor Author

alebsys commented Apr 16, 2024

@bobrik, Thanks for your comments! I corrected Makefile.

@bobrik bobrik merged commit 5cb1bd3 into cloudflare:master Apr 16, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants