Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support fentry/fexit to replace kprobe/kretprobe #143

Closed
wenlxie opened this issue Oct 19, 2022 · 6 comments
Closed

Support fentry/fexit to replace kprobe/kretprobe #143

wenlxie opened this issue Oct 19, 2022 · 6 comments

Comments

@wenlxie
Copy link
Contributor

wenlxie commented Oct 19, 2022

We have a requirement that need to use fentry/fexit as much as possible to replace kprobe,
But some of our nodes' kernel are still in old kernel and not support fentry.
That means we need to solve following issues:

  1. In node with old kernel, we need to use kprobe
  2. In node with new kernel, we can use fentry to replace kprobe
  3. Pods runs on these nodes with the same pod spec.

One of the solution I am thinking is:

  1. Add a kernel version limitation to the programs, if current program not met the requirements, then it will not be loaded
  2. We will provide two programs for the same requirements but with different implementation, one is by fentry and another is kprobe. Only one will be loaded because of kernel version limitation configured

@bobrik Any suggestion for this?
Thanks for your time.

@wenlxie wenlxie changed the title Support fentry/fentryexit to instead of kprobe Support fentry/fentryexit to replace kprobe Oct 19, 2022
@wenlxie wenlxie changed the title Support fentry/fentryexit to replace kprobe Support fentry/fentryexit to replace kprobe/kretprobe Oct 19, 2022
@wenlxie wenlxie changed the title Support fentry/fentryexit to replace kprobe/kretprobe Support fentry/fexit to replace kprobe/kretprobe Oct 19, 2022
@bobrik
Copy link
Contributor

bobrik commented Oct 19, 2022

It looks like fentry is also not yet available on arm64 even in 6.1-rc1. This patch is needed:

I added fentry benchmarks in #144 and it looks much better than kprobe (as expected).

@zuzzas has #136 with both graceful handling for programs that cannot be loaded and kernel constraints.

I'll try to find some time to think how to make this work with the least amount of friction.

@wenlxie
Copy link
Contributor Author

wenlxie commented Oct 19, 2022

The performance of fentry looks really amazing
@bobrik Many thanks for the infos.
Kernel constraints is a common requirements in Cloud. Patiently wait for the solutions. Thanks

@wenlxie
Copy link
Contributor Author

wenlxie commented Oct 20, 2022

Since kprobe is not a stable ABI, so another requirement is that we may need to check for whether the symbol exists or not before load the program.
This is a case that I met recently.
I want to trace for ipvs code, but ip_vs_in symbol changes in different release even the kernel version is the same.

    ffffffffc07695c0 t __ip_vs_init     [ip_vs]
    ffffffffc076a560 t ip_vs_in_icmp_v6.isra.0  [ip_vs]
    ffffffffc076b0d0 t ip_vs_in_icmp    [ip_vs]
    ffffffffc076be50 t ip_vs_in.part.0  [ip_vs]
    ffffffffc076c530 t ip_vs_in [ip_vs]
    ffffffffc076d730 t ip_vs_info_array [ip_vs]
    ffffffffc076d7f0 t ip_vs_info_seq_start     [ip_vs]
    ffffffffc076d820 t ip_vs_info_seq_next      [ip_vs]
    ffffffffc076d8d0 t ip_vs_info_seq_stop      [ip_vs]
    ffffffffc076dbf0 t ip_vs_info_seq_show      [ip_vs]
    ffffffffc0780a60 r ip_vs_info_seq_ops       [ip_vs]
    ffffffffc076a9a0 t ip_vs_init_hash_table    [ip_vs]
    ffffffffc0c87ff0 t __ip_vs_init     [ip_vs]
    ffffffffc0c895c0 t ip_vs_in_icmp_v6 [ip_vs]
    ffffffffc0ca0149 t ip_vs_in_icmp_v6.cold    [ip_vs]
    ffffffffc0c8a2d0 t ip_vs_in_icmp    [ip_vs]
    ffffffffc0ca022a t ip_vs_in_icmp.cold       [ip_vs]
    ffffffffc0c8aef0 t ip_vs_in.part.0  [ip_vs]
    ffffffffc0ca02d1 t ip_vs_in.part.0.cold     [ip_vs]
    ffffffffc0c8cb20 t ip_vs_info_array [ip_vs]
    ffffffffc0c8cbe0 t ip_vs_info_seq_start     [ip_vs]
    ffffffffc0c8cc10 t ip_vs_info_seq_next      [ip_vs]
    ffffffffc0c8ccc0 t ip_vs_info_seq_stop      [ip_vs]
    ffffffffc0c8cfe0 t ip_vs_info_seq_show      [ip_vs]
    ffffffffc0ca3c80 r ip_vs_info_seq_ops       [ip_vs]
    ffffffffc0c899e0 t ip_vs_init_hash_table    [ip_vs]

So maybe we can add a feature to check before load the program.
Program only can be loaded after pass the checks.

These checks can be:

  • kernel version check
    . kernel version > A.B.C
    . kernel version < A.B.C
  • Symbol check
    . Symbol A not exists
    . Symbol B exists
  • Other checks.

@bobrik
Copy link
Contributor

bobrik commented Oct 20, 2022

Is having the same config across the whole fleet the right idea to begin with?

What if somebody wants to load probes based on hardware, like NICs? What if there's something CPU model specific? We can't keep adding checks to ebpf_exporter to accommodate every situation.

I think there's a natural fit for external configuration management here. For physical hardware it might be whatever you use to configure the rest of it (we use Salt). For k8s you can write some sort of script that would do the equivalent work of checking any conditions you want for every program you have.

It can be as simple and as customizable as a python script:

#!/usr/bin/env python3

import yaml
import platform
import packaging.version


def min_kernel_version(min_version):
    min_version = packaging.version.parse(min_version)
    running_version = packaging.version.parse(platform.release().split("-")[0])
    return running_version >= min_version


PROGRAMS = {
    "timers": [],
    "cgroup": [min_kernel_version("5.15")],
}

if __name__ == "__main__":
    enabled = []
    for name, checks in PROGRAMS.items():
        if not all(checks):
            continue
        enabled.append(name)

    programs = []
    for name in enabled:
        with open(f"examples/{name}.yaml", "r") as f:
            config = yaml.load(f, Loader=yaml.SafeLoader)
            programs.append(config["programs"][0])

    print(yaml.dump({"programs": programs}))

@wenlxie
Copy link
Contributor Author

wenlxie commented Oct 20, 2022

@bobrik Thanks
You are right, we can't check for all conditions.
I can add such kind of checks into a startup script.

@bobrik
Copy link
Contributor

bobrik commented Oct 20, 2022

With #145 fentry should be supported. Let me know if you see any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants