Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge pull request iovisor#3691 from chenhengqi/add-mdflush #3

Merged
merged 30 commits into from
Jun 24, 2022

Conversation

fengjixuchui
Copy link
Owner

No description provided.

chenhengqi and others added 30 commits January 19, 2022 20:28
Add libbpf-tool mdflush, this is a replacement for BCC mdflush.
This is part of the solution to #3650.
I have a long running process that attaches BPF programs to containers,
and I want those programs to stay attached after the BPF object goes out
of scope. After some time, the open file descriptors pile up and my
process starts failing since it cannot open new files.

This close() method allows me to create a BPF object, attach my
function, then close the associated file descriptors without detaching
the program.

Signed-off-by: Antonio Terceiro <[email protected]>
/home/rongtao/Git/bcc/src/cc/bpf_module_rw_engine.cc: In member function ‘int ebpf::BPFModule::annotate()’:
/home/rongtao/Git/bcc/src/cc/bpf_module_rw_engine.cc:419:63: warning: ‘llvm::Type* llvm::PointerType::getElementType() const’ is deprecated: Pointer element types are deprecated. You can *temporarily* use Type::getPointerElementType() instead [-Wdeprecated-declarations]
  419 |       StructType *st = dyn_cast<StructType>(pt->getElementType());
      |                                             ~~~~~~~~~~~~~~~~~~^~
In file included from /usr/include/llvm/IR/DataLayout.h:27,
                 from /usr/include/llvm/ExecutionEngine/ExecutionEngine.h:24,
                 from /usr/include/llvm/ExecutionEngine/MCJIT.h:17,
                 from /home/rongtao/Git/bcc/src/cc/bpf_module_rw_engine.cc:20:
/usr/include/llvm/IR/DerivedTypes.h:675:9: note: declared here
  675 |   Type *getElementType() const {
      |         ^~~~~~~~~~~~~~

See llvm: [OpaquePtrs] Deprecate PointerType::getElementType()
llvm/llvm-project@184591a
Belongs to release/14.x branch.
Use task->real_parent instead of task->parent (when one process being traced, the tracer becomes its parent, so use task->real_parent is more accurate).
Unify PID column width (at most 7 chars) #3915, try to unify PID/PPID/TID column width (at most 7 chars).
Sometimes, I just want to focus on a specified disk rather than all disks or per-disk. Refer to libbpf-tools/biolatency, this patch try to add disk filter support.
sync libbpf repo to
  4eb6485c0886  Makefile: add support for cross compilation

Signed-off-by: Yonghong Song <[email protected]>
In cross compiling case, we need target toolchain for application code,
and host toolchain for bpftool.

Signed-off-by: Jie Wang <[email protected]>
When the output is redirected to a file, it'd be better flush the
output for each iteration.
Instead of nsec, it can show the time with unit like below:

  $ sudo klockstat -n 5
  Tracing mutex/sem lock events...  Hit Ctrl-C to end
  ^C

                               Caller  Avg Wait    Count   Max Wait   Total Wait
      iwl_mvm_mac_sta_statistics+0x6b  703.9 us        4     2.5 ms       2.8 ms
                i915_vma_pin_ww+0x6ff    1.0 us     5224     1.8 ms       5.2 ms
                  do_epoll_wait+0x1d5    1.9 us     8569   651.3 us      16.0 ms
           kernfs_dop_revalidate+0x35    1.1 us     1176   540.2 us       1.3 ms
           kernfs_iop_permission+0x2a    1.0 us     1512   528.6 us       1.5 ms

                               Caller  Avg Hold    Count   Max Hold   Total Hold
                     __fdget_pos+0x42   13.4 ms      201    19.7 ms       2.7 s
                        genl_rcv+0x15    1.8 ms        8     6.3 ms      14.3 ms
                nl80211_pre_doit+0xdb    1.9 ms        4     6.2 ms       7.4 ms
           ieee80211_get_station+0x2a    1.8 ms        4     6.2 ms       7.1 ms
        bpf_tracing_prog_attach+0x264    2.9 ms       15     3.8 ms      43.4 ms
  Exiting trace of mutex/sem locks
The --per-thread option is to aggregate the lock stats per thread
instead of per callstack.  The result is like below:

  $ sudo klockstat -n 5 -P
  Tracing mutex/sem lock events...  Hit Ctrl-C to end
  ^C

                Tid              Comm  Avg Wait    Count   Max Wait   Total Wait
             366434     kworker/u17:1  273.3 us       18     3.0 ms       4.9 ms
               4286   Chrome_ChildIOT    1.5 us      335    57.4 us     488.3 us
               4325   VizCompositorTh    1.1 us      751    20.5 us     817.0 us
               4324   Chrome_ChildIOT    1.3 us      332    18.2 us     443.7 us
              92900       Web Content    1.5 us       45    16.0 us      67.8 us

                Tid              Comm  Avg Hold    Count   Max Hold   Total Hold
               1056         in:imklog    4.0 ms      349    20.3 ms       1.4 s
             366519     kworker/u17:3  605.8 us       42    15.3 ms      25.4 ms
             368783         klockstat    1.0 ms      180     2.8 ms     184.3 ms
               4250   Chrome_IOThread    8.4 us      342     1.5 ms       2.9 ms
               2916       gnome-shell    2.6 us      773   383.4 us       2.0 ms
  Exiting trace of mutex/sem locks
The tools biosnoop and biostacks are broken due to kernel change ([0]).
blk_account_io_{start, done} were renamed to __blk_account_io_{start, done},
and the symbols gone from vmlinux BTF. Fix them by checking symbol existence.

  [0]: torvalds/linux@be6bfe3

Signed-off-by: Hengqi Chen <[email protected]>
bcc: add method to close file descriptors
Using hash maps with BPF_F_NO_PREALLOC flag triggers a warning ([0]), and according
to kernel [commit 94dacdbd5d2d](torvalds/linux@94dacdbd5d2d),
this may cause deadlocks. Remove the flag from libbpf tools.

  [0]: https://github.com/torvalds/linux/blob/v5.18/kernel/bpf/verifier.c#L11972-L12000

Signed-off-by: Hengqi Chen <[email protected]>
Remove executable permission for 'funcinterval.8'.
ERROR:

 $ sudo ./xfsslower.py
 [...]
 80: (07) r4 += -104
 ; bpf_perf_event_output(ctx, bpf_pseudo_fd(1, -2), CUR_CPU_IDENTIFIER, &data, sizeof(data));
 81: (bf) r1 = r6
 82: (18) r3 = 0xffffffff
 84: (b7) r5 = 96
 85: (85) call bpf_perf_event_output#25
 invalid indirect read from stack R4 off -104+92 size 96
 processed 82 insns (limit 1000000) max_states_per_insn 0 total_states 4 peak_states 4 mark_read 3

 Traceback (most recent call last):
   File "/home/rongtao/Git/rtoax/bcc/tools/./xfsslower.py", line 271, in <module>
     b.attach_kretprobe(event="xfs_file_read_iter", fn_name="trace_read_return")
   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 868, in attach_kretprobe
     fn = self.load_func(fn_name, BPF.KPROBE)
   File "/usr/lib/python3.9/site-packages/bcc/__init__.py", line 522, in load_func
     raise Exception("Failed to load BPF program %s: %s" %
 Exception: Failed to load BPF program b'trace_read_return': Permission denied

Solve according to #2623
With the latest clang and bpftool the build of tcpconnect fails as
follows (output patched a bit for readability):

    bpftool gen skeleton tcpconnect.bpf.o > tcpconnect.skel.h
    Error: Something is wrong for .rodata's variable #1: need offset 0, already at 4.

This happens because the filter_ports variable is declared using a ".rodata hack":

    SEC(".rodata") int filter_ports[MAX_PORTS]

This breaks with the recent clang, as the filter_ports variable is placed into
the '.rodata,aw' section. Older clang would put it into '.rodata,a' section
where all 'const volatile' variables are placed. The result is that the
filter_ports variable has a wrong offset of 0 in BTF_KIND_DATASEC.

To hack the hack we can declare the variable as

    SEC(".rodata") const int filter_ports[MAX_PORTS]

but, instead, we now can just declare it as

    const volatile int filter_ports[MAX_PORTS]

In fact, this was already done in a02663b ("libbpf-tools: update bpftool and
fix .rodata hack"), but a later commit f8ac3c6 ("libbpf-tools: fix tcpconnect
compile errors") reverted the change without any comments.

Signed-off-by: Anton Protopopov <[email protected]>
For example:

 Before:
 $ sudo ./funccount.py -i 1 'xfs_f*'
 cannot attach kprobe, Invalid argument
 Failed to attach BPF program b'trace_count_62' to kprobe b'xfs_fs_eofblocks_from_user'

 After:
 $ sudo ./funccount.py -i 1 'xfs_f*'
 cannot attach kprobe, Invalid argument
 Failed to attach BPF program b'trace_count_10' to kprobe b'xfs_fs_eofblocks_from_user', it's not traceable (either non-existing, inlined, or marked as "notrace")

 In kernel:
 static inline int
 xfs_fs_eofblocks_from_user(...)
sync libbpf repo to
    4cb6822 configs: Enable CONFIG_MODULE_SIG
Since
libbpf/libbpf@7e8d423,
this structure is gone from libbpf.

BCC uses it as a structure to pass around `bcc_create_map_xattr` and
`libbpf_bpf_map_create`.
The alternative would be to modify both libbpf_bpf_map_create and
bcc_create_map_xattr to take each arguments, which I am not sure it
would be any better.
Renamed the struct from `bpf_create_map_xattr` to `bcc_create_map_xattr` to better reflect this is a bcc-provided struct, not bpf anymore.
sync with libbpf + backport `struct bpf_create_map_attr`
Add `tracepoint:skb:kfree_skb` support
tools/syscount: Beautify output of syscall list
@fengjixuchui fengjixuchui merged commit 22015af into fengjixuchui:master Jun 24, 2022
fengjixuchui pushed a commit that referenced this pull request Nov 15, 2022
…for -v option

Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)

Print backtrace of ip if it failed to get syms.

Before:
  # offcputime -v
    psiginfo
    vscanf
    __snprintf_chk
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    sd_event_exit
    sd_event_dispatch
    sd_event_run
    [unknown]
    __libc_start_main
    [unknown]
    -                systemd-journal (204)
        1

    xas_load
    xas_find
    filemap_map_pages
    __handle_mm_fault
    handle_mm_fault
    do_page_fault
    do_translation_fault
    do_mem_abort
    do_el0_ia_bp_hardening
    el0_ia
    xas_load
    --
failed to get syms
      -                PmLogCtl (138757)
        1

After:
  # offcputime -v
    #0  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    #1  0xffffffc01009a93c el0_svc_handler+0x34
    #2  0xffffffc010084a08 el0_svc+0x8
    #3  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    --
    #4  0x0000007fa0bffd14 clock_nanosleep+0x94 (/usr/lib/libc-2.31.so+0x9ed14)
    #5  0x0000007fa0c0530c nanosleep+0x1c (/usr/lib/libc-2.31.so+0xa430c)
    #6  0x0000007fa0c051e4 sleep+0x34 (/usr/lib/libc-2.31.so+0xa41e4)
    #7  0x000000558a5a9608 flb_loop+0x28 (/usr/bin/fluent-bit+0x52608)
    #8  0x000000558a59f1c4 flb_main+0xa84 (/usr/bin/fluent-bit+0x481c4)
    #9  0x0000007fa0b85124 __libc_start_main+0xe4 (/usr/lib/libc-2.31.so+0x24124)
    #10 0x000000558a59d828 _start+0x34 (/usr/bin/fluent-bit+0x46828)
    -                fluent-bit (1238)
        1

    #0  0xffffffc01027daa4 generic_copy_file_checks+0x334
    #1  0xffffffc0102ba634 __handle_mm_fault+0x8dc
    #2  0xffffffc0102baa20 handle_mm_fault+0x168
    #3  0xffffffc010ad23c0 do_page_fault+0x148
    #4  0xffffffc010ad27c0 do_translation_fault+0xb0
    #5  0xffffffc0100816b0 do_mem_abort+0x50
    #6  0xffffffc0100843b0 el0_da+0x1c
    #7  0xffffffc01027daa4 generic_copy_file_checks+0x334
    --
    #8  0x0000007f8dc12648 [unknown]
    #9  0x0000007f8dc0aef8 [unknown]
    #10 0x0000007f8dc1c990 [unknown]
    #11 0x0000007f8dc08b0c [unknown]
    #12 0x0000007f8dc08e48 [unknown]
    #13 0x0000007f8dc081c8 [unknown]
    -                PmLogCtl (2412)
        1

Fixed: iovisor#3884
Signed-off-by: Eunseon Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
10 participants