-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
funccount: Fix on-CPU hang when attaching to SyS_* #780
Conversation
`BPF.get_kprobe_functions` does not filter duplicates, and as a result may return the same function name more than once if it appears in /sys/kernel/debug/tracing/available_filter_functions more than once. Change the function's behavior to filter out duplicates before returning, so we don't end up attaching the same kprobe more than once.
Avoiding the prepopulation of the location cache allows us to get rid of the `zero()` call at the end of each interval, which would hang the program at full CPU. Replaced the prepopulation with a `lookup_or_init` and the `zero()` call with a call to `clear()`.
@drzaeus77 I faintly recall that you said something about an issue with recursive map updates. Does this apply here? Can you please refresh my memory? |
Thanks @goldshtn , I'm happy to close #777 if this is better. I'm trying to remember the original issues. #415 ? Here's that test case with your new funccount:
Which works. I can even add a "-i 1". |
Does this work for you? (regarding avoiding duplicates):
that's in available_filter_functions twice. The problem with recursive maps I think was fixed in the kernel by https://lkml.org/lkml/2016/3/8/24 "bpf: prevent kprobe+bpf deadlocks". That patch is in latest, but isn't, say, in the current latest Ubuntu Xenial 4.4. I'm not sure if this tool still has the workarounds necessary (now that it's switching back to lookup_or_init()). I still haven't paged #415 back into my brain -- it feels like years ago! I did test this version on 4.9-rc1 for hours with different workloads, and it was fine. |
@brendangregg Yes, I don't have that error (tested on FC23, 4.9). Did you remember to |
I'm also fuzzy on this one, but I feel like this approach may also result in a hang depending on the order of lookup_or_init() racing with the clear(). I think it may still be possible for a lookup_or_init() to inject a new entry into a particular bucket of a map faster than the userspace can clear that same bucket. Can you try the original code but instead put a |
@drzaeus77 But the code that does the prepopulation doesn't hang. It's the call to |
@drzaeus77 Also, wait, |
@drzaeus77 But I have another idea, which is even closer to what @4ast wanted in the first place. With this version of funccount, we know the number of locations in advance. So I'll simply declare a BPF array large enough, and we won't need any dynamic lookups or clears. |
Right, probably in table.py it needs to be: def zero(self):
for k in list(self.keys()):
self[k] = self.Leaf() Whether the underlying table is map/array/other doesn't matter, since the kernel api exposed is maplike, and python wraps it with an iterator interface. Using Honestly this particular race seems to me like it should be fixed in the kernel, but I don't think that is trivial to do so we should work around it. To put in a nutshell, the way zero is implemented requires that we deal with this racy behavior, otherwise we'll keep hitting it. That said, we can by all means fixup funccount to be more optimal if possible. |
To complete the story, the bug stems from a particular kernel behavior: Suppose a bpf map contains many entries, two of which happen to hash to the same bucket. That is, (hash(A) == hash(B) == H1)
Calling zero() on the map causes an update to be called as follows: |
Because we know the number of probes in advance before attaching them, we can simply preinitialize a fixed-size array instead of using a BPF map. This avoids potential deadlocks/hangs/race conditions with the Python program and internally in the kernel. See also iovisor#415, iovisor#665, iovisor#233 for more discussion.
To avoid a potential race with the key zeroing modifying the next hash key retrieved by the loop in `Table.zero()`, retrieve all the keys in user space first before starting the zeroing loop. See discussion on iovisor#780. Tested with `funccount 'SyS_*' -i 1` while running a heavy read/write test application (`dd`) in the background for several minutes with no visible issues.
Right. So I'm going to update this branch when I migrate funccount to use an array, and then fix |
@goldshtn ah thanks, I'd missed a make install. Duplicate issue now fixed. I'll try to get a Xenial 4.4 build environment setup for testing (one that lacks that kernel patch). |
Kernel hang. This is Ubuntu Xenial with a 4.4 kernel
My version in #777 does not induce the hang. |
Right. I'll be moving back to |
sounds plausible. |
Since this didn't hang on 4.9... @4ast, is this the patchset that fixes it in-kernel, or is this a different issue?
Maybe Canonical can backport for Xenial users. |
LGTM |
@brendangregg @drzaeus77 Pushed two additional commits: one changes funccount to use a fixed-size prepopulated map and clear it key-by-key; the other also changes |
lgtm @drzaeus77 |
Ok, that test worked on 4.4. I also checked 4.9. (... really need to give funccount a "-d duration" take make testing not need a human Ctrl-C (and change the -d for debug to -D). I was hacking it in with a "timeout -s2 5" prefix.) However:
So a Ctrl-C had left uprobes still initialized. The reset-ftrace -v shows how it was (that's a cat -nv). |
no, it's actually broken, not just still-initialized uprobes:
|
@brendangregg 4.4.0-31 is still broken w.r.t. uprobes, the memory leak is fixed at about 4.4.0-42. This is what made the 1604 buildbot needing a reboot once a week or so. |
ok, I'll start digging, but if you know the patch description offhand that'd be useful; just need to always check it's merged in the kernels we're deploying |
@drzaeus77 thanks; so just accounting, not an actual leak |
It's pretty bad. The accounting mismatch results in pretty serious failures. I saw an oops when I tried to do a kexec after I hit the failure in some cases. |
This is a possible fix to the funccount issue (instead of #777). First, it makes sure we don't attach to the same kprobe more than once by fixing a non-uniqueness bug in
BPF.get_kprobe_functions
. Second, it modifies funccount to uselookup_or_init
andTable.clear
instead orlookup
andTable.zero
.