Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

annotate program tag #1332

Merged
merged 2 commits into from
Sep 8, 2017
Merged

annotate program tag #1332

merged 2 commits into from
Sep 8, 2017

Conversation

4ast
Copy link
Member

@4ast 4ast commented Sep 2, 2017

during debug of production systems it's difficult to trace back
the kernel reported 'bpf_prog_3196C6D60BBB8549' symbols to the source code
of the program, hence teach bcc to store the main function source
in the /var/tmp/bcc/bpf_prog_3196C6D60BBB8549/ directory.

This program tag is stable. Every time the script is called the tag
will be the same unless source code of the program changes.
During active development of bcc scripts the /var/tmp/bcc/ dir can
get a bunch of stale tags.
The users have to trim that dir manually.

Python scripts can be modified to use this feature too, but probably
need to be gated by the flag. For c++ api I think it makes sense
to store the source code always, since the cost is minimal and
c++ api is used by long running services.

Example:
$ ./examples/cpp/LLCStat
$ ls -l /var/tmp/bcc/bpf_prog_3196C6D60BBB8549/
total 16
-rw-r--r--. 1 root root 226 Sep 1 17:30 on_cache_miss.c
-rw-r--r--. 1 root root 487 Sep 1 17:30 on_cache_miss.rewritten.c
-rw-r--r--. 1 root root 224 Sep 1 17:30 on_cache_ref.c
-rw-r--r--. 1 root root 484 Sep 1 17:30 on_cache_ref.rewritten.c
$ cat /var/tmp/bcc/bpf_prog_3196C6D60BBB8549/on_cache_miss.c
int on_cache_miss(struct bpf_perf_event_data *ctx) {
struct event_t key = {};
get_key(&key);

u64 zero = 0, *val;
val = miss_count.lookup_or_init(&key, &zero);

...

Signed-off-by: Alexei Starovoitov [email protected]

@4ast
Copy link
Member Author

4ast commented Sep 2, 2017

cc @yonghong-song @palmtenor @drzaeus77 thoughts?

@drzaeus77
Copy link
Collaborator

Should the files survive a reboot? /var/tmp vs /var/run? Probably should be cmake configurable.

@4ast
Copy link
Member Author

4ast commented Sep 2, 2017

should definitely survive the reboot. both /var/run and /var/tmp on my system have this property. don't have strong opinion which one is better. /var/tmp has more clear meaning that it's temporary.
cmake flag is good idea. will do.

Copy link
Collaborator

@drzaeus77 drzaeus77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these can be two separate commits, with the tag computation coming first and the rewritten source file stuff second.


::snprintf(buf, sizeof(buf), "/var/tmp/bcc/bpf_prog_%llX/%s.c",
tag1, name.data());
int fd = open(buf, O_CREAT | O_WRONLY | O_TRUNC, 0644);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use FileDesc class here.

@@ -33,7 +33,7 @@ BLoader::~BLoader() {
}

int BLoader::parse(llvm::Module *mod, const string &filename, const string &proto_filename,
TableStorage &ts, const string &id) {
TableStorage &ts, const std::string &id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It is inconsistent with surrounding lines, and unnecessary in this particular .cc file.

std::map<std::string, SourceCode> funcs_;
public:
FuncSource() {}
const char * get_src(const std::string& name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to return a const string & so that caller has option to use other string methods if they want, or data() otherwise. This is throwing away type information for no reason.

Also, the convention is to name the functions as src() and src_rewritten(), omitting the get_ prefix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning static string s = ""; return s; looked uglier to me than return "";
will drop get_

src/cc/libbpf.h Outdated
@@ -89,6 +89,10 @@ int bpf_close_perf_event_fd(int fd);

int bpf_obj_pin(int fd, const char *pathname);
int bpf_obj_get(const char *pathname);
int bpf_obj_get_info_by_fd(int prog_fd, void *info, int *info_len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does _by_fd need to be here? It is inconsistent with other bpf_xx functions, the fd is generally an implicit parameter.

.salg_type = "hash",
.salg_name = "sha1",
};
int shafd = socket(AF_ALG, SOCK_SEQPACKET, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, never knew about this one. Alternatively, there are some header-only implementations of sha1, but I assume you chose this approach for a reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which header only? was only aware of this one, since it's so simple.

@palmtenor
Copy link
Member

The Rewritten text would be super useful when debugging!

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A high-level question. The current scheme is to write each function (before/after rewriter) into two files. Does it make sense to base on compilation unit (each source file is a compilation unit)? This way, macro's, functions to be inlined later, table definition, will be all in the same file, easy to see relationships.

This may simplify your implementation as well?

Later on, I do plan to implement a debug option to permit user to print out source code annotated byte code (implementation will be similar to llvm-objdump -D). So with verifier error, they can go back to see which source line is corresponding to and hopefully can find the error easily. This is source file based as well.

@4ast
Copy link
Member Author

4ast commented Sep 6, 2017

@yonghong-song
most bpf programs are written with meaningful function body and not just single call into always_inline.
Also single compilation typically has many bpf programs. If bcc keeps the whole file for all prog_tags, how the user will know which one is running?
also file won't have all the source either. kernel headers, bcc_helpers are external to the file.
Ideally we could preserve .c code after inlining, but that's not possible.
So original func src and rewritten one seems to be as a good start for typical bcc usage.
The upcoming CTF work and further extensions to BPF_OBJ_GET_INFO_BY_FD
will give us all the info about maps, so keeping full file just to see the maps is imo
unnecessary. The rewritten .c code has bpf_pseudo(FD_NUM) calls, so it's
easy for humans to see which map fd to look at for map details.
We can make bcc store map info in /var/bcc too, but imo it's not necessary
since obj_get_info can return that as well we just need to build the tools.
Storing annotated bytecode (as llvm-objdump) associated with prog_tag
also would be great!

@yonghong-song
Copy link
Collaborator

@4ast Right, I forgot to pay attention to prog tag is for bpf prog. Esp if you want to load one prog multiple times, prog tag should differentiate it. My compilation unit idea does not fit here.

Currently, the files are dumped after loading the functions into the kernel.
The files will be preserved when functions are unloaded or even reboot.
This is a good first step towards exposing kernel bpf states in user spaces.
Expecting more in the future to reflect the actual runtime bpf state on the host!

src/cc/libbpf.c Outdated
}
struct bpf_insn prog[prog_len / 8];
bool map_ld_seen = false;
for (int i = 0; i < prog_len / 8; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like old version of gcc (4.8.5) does not like "for (int i = 0; ...". You need additional flags to make gcc happy. So maybe it is a good idea to have " int i; for (i = 0; ...)".

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach is good. A few additional comments beyond the inlined ones:
(1). the commit message probably outdated.
$ ls -l /var/tmp/bcc/bpf_prog_3196C6D60BBB8549/
total 16
-rw-r--r--. 1 root root 226 Sep 1 17:30 on_cache_miss.c
-rw-r--r--. 1 root root 487 Sep 1 17:30 on_cache_miss.rewritten.c
-rw-r--r--. 1 root root 224 Sep 1 17:30 on_cache_ref.c
-rw-r--r--. 1 root root 484 Sep 1 17:30 on_cache_ref.rewritten.c
Actually, for each function, there is a program tag. So the above four files should be under two different directories.
(2). How user knows which bpf_prog_#tag directory to look? This one probably needs another tool to enumerate all the prog info's on the host. One can manually enumerate as well through /proc//fdinfo/.
(3). Currently, /var/tmp/bcc will just contain a bunch of bpf_prog_ directories. It would be hard to find the one corresponding to an attachment point.
So either the prog enumeration tool should print out attachment point or we could add attachment point to the directory name. (I know you can go one more dir level to find the function names).

src/cc/libbpf.c Outdated
@@ -219,6 +220,115 @@ static void bpf_print_hints(char *log)
}
#define ROUND_UP(x, n) (((x) + (n) - 1u) & ~((n) - 1u))

int bpf_obj_get_info_by_fd(int prog_fd, void *info, int *info_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not used in this patch. I guess this API previously want to get prog_tag and now it directly accesses /proc//fdinfo/ to get this. Since we do not have a well defined structure for "info" yet,
maybe we can leave this API for now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are bpf_prog_info and bpf_map_info for info. The api takes void* since it will return whatever FD type is. I can split it into separate patch.

src/cc/libbpf.c Outdated
}
unsigned long long tag = 0;
sscanf(p + 1, "%llx", &tag);
*ptag = __builtin_bswap64(tag);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have a little problem for this.
We do a swap for the tag and later on use swapped tag as part of the file name.
This will cause a mismatch between /proc/self/fdinfo//prog_tag and file name,
e.g.,
prog_tag before swap is 0x2, then
directory name will be bpf_prog_0200000000000000.
Which is a little bit awkward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't follow. scanf + swap + printf %llx will do the same bpf_prog_xxx, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, just gave a concrete example. Tried one program with two kprobe attachments.
[root@localhost bpf]# cat /proc/20991/fdinfo/8
pos: 0
flags: 02000002
mnt_id: 12
prog_type: 2
prog_jited: 0
prog_tag: 1a8ee7e0aed58b39
memlock: 4096
[root@localhost bpf]# cat /proc/20991/fdinfo/4
pos: 0
flags: 02000002
mnt_id: 12
prog_type: 2
prog_jited: 0
prog_tag: c6fbcd6ef2ad039f
memlock: 4096

[root@localhost bpf]# ls -l /var/tmp/bcc
total 0
drwxr-xr-x. 2 root root 64 Aug 31 12:22 bpf_prog_398BD5AEE0E78E1A
drwxr-xr-x. 2 root root 60 Aug 31 12:06 bpf_prog_9F03ADF26ECDFBC6
[root@localhost bpf]#

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. I guess I added that swap in the wrong place then. Will move it to bpf_prog_compute_tag().
Do you think there is an issue with scanf() or concern was for bswap only?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sscanf should be okay. The concern is only for bswap.

@4ast
Copy link
Member Author

4ast commented Sep 6, 2017

(1). the commit message probably outdated.
$ ls -l /var/tmp/bcc/bpf_prog_3196C6D60BBB8549/
total 16
-rw-r--r--. 1 root root 226 Sep 1 17:30 on_cache_miss.c
-rw-r--r--. 1 root root 487 Sep 1 17:30 on_cache_miss.rewritten.c
-rw-r--r--. 1 root root 224 Sep 1 17:30 on_cache_ref.c
-rw-r--r--. 1 root root 484 Sep 1 17:30 on_cache_ref.rewritten.c
Actually, for each function, there is a program tag. So the above four files should be under two different directories.

the commit is accurate. I picked that example to demonstrate that
two .c files with different names hash to the same prog_tag because
they have exactly the same bpf bytecode

(2). How user knows which bpf_prog_#tag directory to look? This one probably needs another tool to enumerate all the prog info's on the host. One can manually enumerate as well through /proc//fdinfo/.

yes. we need another tool that does prog_get_next_id() + get_info_by_fd().
It needs to be standalone tool. Ideally we'll use libbpf part of bcc
without llvm. It should probably be in pure C code, so we can eventually
put it into kernel tree or some place else.
cc @iamkafai

(3). Currently, /var/tmp/bcc will just contain a bunch of bpf_prog_ directories. It would be hard to find the one corresponding to an attachment point.
So either the prog enumeration tool should print out attachment point or we could add attachment point to the directory name. (I know you can go one more dir level to find the function names).

attachment info will go stale, so if we store such info in dir, we probably
need to clean it up after execution, but not .c files.
I like tag->.c annotation, since it's stable. Even when debugging is happening
a week later. Like if we have 'perf report' from any server we can login
to one of them and look for /var/../bpf_prog_xxx/
That .c code will be exactly what was running back then.
So I'd prefer to keep /var/../bpf_prog_xxx/ for program code related info only.
Like original .c, rewritten, annotated bytecode, llvm ir.

I think separate tool to go from attachment point to prog_tag is better,
since it's for debugging on the specific box while things are live.

bpf_obj_get_info() to retreive prog_tag from the kernel based on prog_fd (kernel 4.13+)
bpf_prog_compute_tag() to compute prog_tag from a set of bpf_insns (kernel independent)
bpf_prog_get_tag() to retrieve prog_tag from /proc/pid/fdinfo/fd (kernel 4.10+)

Signed-off-by: Alexei Starovoitov <[email protected]>
during debug of production systems it's difficult to trace back
the kernel reported 'bpf_prog_4985bb0bd6c69631' symbols to the source code
of the program, hence teach bcc to store the main function source
in the /var/tmp/bcc/bpf_prog_4985bb0bd6c69631/ directory.

This program tag is stable. Every time the script is called the tag
will be the same unless source code of the program changes.
During active development of bcc scripts the /var/tmp/bcc/ dir can
get a bunch of stale tags. The users have to trim that dir manually.

Python scripts can be modified to use this feature too, but probably
need to be gated by the flag. For c++ api I think it makes sense
to store the source code always, since the cost is minimal and
c++ api is used by long running services.

Example:
$ ./examples/cpp/LLCStat
$ ls -l /var/tmp/bcc/bpf_prog_4985bb0bd6c69631/
total 16
-rw-r--r--. 1 root root 226 Sep  1 17:30 on_cache_miss.c
-rw-r--r--. 1 root root 487 Sep  1 17:30 on_cache_miss.rewritten.c
-rw-r--r--. 1 root root 224 Sep  1 17:30 on_cache_ref.c
-rw-r--r--. 1 root root 484 Sep  1 17:30 on_cache_ref.rewritten.c

Note that there are two .c files there, since two different
bpf programs have exactly the same bytecode hence same prog_tag.

$ cat /var/tmp/bcc/bpf_prog_4985bb0bd6c69631/on_cache_miss.c
int on_cache_miss(struct bpf_perf_event_data *ctx) {
    struct event_t key = {};
    get_key(&key);

    u64 zero = 0, *val;
    val = miss_count.lookup_or_init(&key, &zero);
...

Signed-off-by: Alexei Starovoitov <[email protected]>
@4ast
Copy link
Member Author

4ast commented Sep 7, 2017

catping
cc @yonghong-song @drzaeus77

@brendangregg
Copy link
Member

Will one of these cats please score a point? I've been watching them for hours.

@brendangregg
Copy link
Member

I tested it and it works. We're going to need this for python as well.

@yonghong-song yonghong-song merged commit d56fff0 into master Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants