Smoke tests for the tools #1032

goldshtn · 2017-03-07T13:34:55Z

This commit adds basic smoke tests for most tools in tools/ by
running the tool with either a short duration, or interrupting it
with a SIGINT after a short duration. The tests check the return
value from the tool to detect any Python exceptions or other
errors, but they do not read the standard error or standard output
and parse the tool's result.

Some tools are not covered by these smoke tests for reasons
documented in the test itself:

btrfsdist and btrfsslower need btrfs
cachetop doesn't like to run without a terminal
dbslower, dbstat, and mysqld_qslower need a database engine
deadlock_detector allocates a huge amount of memory
softirqs doesn't work on new kernels and needs fixing (softirqs is broken on newer kernels #1031)
ugc needs a USDT-enabled runtime with GC probes
xfsdist and xfsslower don't work on the built bots because of missing probe functions
zfsdist and zfsslower need zfs

This is a good place to start, but clearly for some tools,
especially those with a complex interface like trace and argdist,
we need more than just basic smoke tests.

Also included in this PR are minor fixes to some tools to enable
easy testing.

Resolves #981.

goldshtn · 2017-03-07T20:58:24Z

I'm off to bed now; if this builds, I'll rebase tomorrow to get rid of the horrible commit mess and then it will be ready for merging. In the meantime, @brendangregg @4ast @drzaeus77 can you take a look?

4ast

looks great! thank you for working on it!

4ast · 2017-03-08T02:07:11Z

tests/python/test_tools_smoke.py

+ def run_with_int(self, command, timeout=5, allow_early=False, kill=False):
+ full_command = TOOLS_DIR + command
+ signal = "KILL" if kill else "INT"
+ rc = subprocess.call("timeout -s %s -k %ds %ds %s > /dev/null" %


timeout on centos6 doesn't have -k flag :(

Hmm. Do we care about CentOS 6? If we do, I could nest the timeouts:

timeout -s KILL 7s timeout -s INT 5s ...

Or just write the logic manually for killing the process if the SIGINT didn't get rid of it. What do you think?

i think doing killing from python side and greping for expected output instead of >/dev/null is more accurate, but I'm fine with the current code.

4ast · 2017-03-08T02:08:58Z

tests/python/test_tools_smoke.py

+
+ @skipUnless(kernel_version_ge(4,4), "requires kernel >= 4.4")
+ def test_bashreadline(self):
+ self.run_with_int("bashreadline.py")


I think as the next step we still want to --dry-run internal flag?

I don't know if it's necessary. What we're doing here is almost identical: launch the tool, give it time to initialize, and then exit. It would be a lot of work adding this to all the tools, and I'm not sure what we'd gain.

4ast · 2017-03-08T02:09:48Z

tests/python/test_tools_smoke.py

+ full_command = TOOLS_DIR + command
+ signal = "KILL" if kill else "INT"
+ rc = subprocess.call("timeout -s %s -k %ds %ds %s > /dev/null" %
+ (signal, 5, timeout, full_command), shell=True)


shouldn't KILL and INT timeouts be different?
should INT timout be short like 1 second?

1 second is not enough because a lot of tools take a while to initialize, especially on a slow system. Sending a SIGINT while they are still initializing sort of defeats the purpose of the smoke test.

i'm still missing how both timeouts can be 5 seconds.

You mean in this specific command? The -k timeout is the number of seconds after the first signal's timeout was breached, relative not absolute.

ahh. that makes sense.
i think the assumption that every script should be ctrl-c-able in 5 seconds is fragile and --dry-run can make it precise and accurate, but I'm fine with current setup for now.

4ast · 2017-03-08T02:10:52Z

tests/python/test_tools_smoke.py

+ def test_argdist(self):
+ self.run_with_duration("argdist.py -C 'p::SyS_open()' -n 1 -i 1")
+
+ @skipUnless(kernel_version_ge(4,4), "requires kernel >= 4.4")


the kernel version check won't work for backported kernels, but should be ok for iovisor buildbots.

Yeah, and it's needed there -- I've had to tune a lot of the tests for this. E.g. it turned out that mountsnoop uses bpf_get_current_task which is only available on 4.8, and so on.

4ast · 2017-03-08T02:12:16Z

tests/python/test_tools_smoke.py

+ @skipUnless(kernel_version_ge(4,6), "requires kernel >= 4.6")
+ def test_deadlock_detector(self):
+ # TODO This tool requires a massive BPF stack traces table allocation,
+ # which might fail the run or even trigger the oomkiller to kill some


trigger oom just by running it? that's not expected. cc @kennyyu

Yup. It's got some seriously massive hashes and stack trace tables. https://github.com/iovisor/bcc/blob/master/tools/deadlock_detector.c

@4ast I made it huge originally in order to be able trace very large binaries with a large number of threads. The sizes in those maps should be made configurable.

Is there an easy way to pass arguments to the bpf C code from the user space python program, besides inlining the C program into the python program? Currently, the deadlock_detector bpf code exists as a separate C file.

The BPF() object has an optional cflags=[] argument, which you could pass -DMYDEFINE=xxx to.

4ast · 2017-03-08T02:15:03Z

tests/python/test_tools_smoke.py

+ def test_softirqs(self):
+ # TODO Temporary disabled as softirqs.py doesn't work on recent
+ # kernels (can't find some of its attach targets). Need to revisit
+ # it to use the softirq tracepoints. Tracked in bcc#1031.


looking at this comment it feels that this .py file will have a lot of churn.
Every new script and temporary workarounds to the broken scripts will be updating this file.
can it be split in multiple files somehow to make future PRs less conflicting with each other?

I wouldn't want a separate file for each tool, there are dozens of them :)
Also once this stabilizes I'd expect changes to this file only when a new tool is added. You update README.md, you also add a smoke test for your tool, done. What do you think?

Let's start with one file, we can always break it up later if it becomes annoying.

4ast · 2017-03-08T02:16:10Z

tests/python/test_tools_smoke.py

+ self.run_with_duration("wakeuptime.py 1")
+
+ def test_xfsdist(self):
+ # Doesn't work on build bot because xfs functions not present in the


the VMs we have don't have xfs installed? that's odd. xfs is a default fs for most distros now. We should test it.

I think it was only the Ubuntu bot. And I don't know if it has xfs or not, but this is the traceback:

Traceback (most recent call last): 28: File "../../tools/xfsdist.py", line 136, in <module> 28: b.attach_kprobe(event="xfs_file_read_iter", fn_name="trace_entry") 28: File "/home/iovisor/jenkins/workspace/bcc-pr/label/ubuntu1604/src/python/bcc/__init__.py", line 511, in attach_kprobe 28: raise Exception("Failed to attach BPF to kprobe")

So xfs_file_read_iter doesn't exist there.

BTW it is quite likely that some of the *fsdist, *fsslower tools can be rewritten using tracepoints. @brendangregg

Here is from one of the ubuntu machines:

$ sudo grep xfs_file_read_iter /proc/kallsyms ffffffffc043fe70 t xfs_file_read_iter [xfs]

I'd really need access to the bots to debug this then I'm afraid. I can't go on on much more than the log outputs ...

@drzaeus77 Can that be arranged?

Yes, please unicast me your ssh pubkey.

Done, thanks.

Uhm, it's not there :)

root@ubuntu1604-slave-681:/home/iovisor/bcc/tests/python# grep xfs_file_read_iter /proc/kallsyms root@ubuntu1604-slave-681:/home/iovisor/bcc/tests/python#

@drzaeus77

goldshtn · 2017-03-08T09:07:10Z

@4ast Oh and also note that contrary to your slightly pessimistic estimation 😄 , there was only one tool that was completely broken (softirqs) and one other tool that probably needs fixing (deadlock_detector).

This is tongue-in-cheek of course and we should absolutely invest in more testing for the tools! This is at least a start.

brendangregg · 2017-03-08T22:07:59Z

Thanks!

Ignoring the "enum expression < 0 is always false" warnings (which should be fixed on Linux 4.11/12), I got:

======================================================================
FAIL: test_solisten (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_tools_smoke.py", line 246, in test_solisten
    self.run_with_int("solisten.py")
  File "./test_tools_smoke.py", line 50, in run_with_int
    or (rc == 137 and kill), "rc was %d" % rc)
AssertionError: rc was 1

======================================================================
FAIL: test_ucalls (__main__.SmokeTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "./test_tools_smoke.py", line 312, in test_ucalls
    self.run_with_int("ucalls.py -S $(pgrep -n python)")
  File "./test_tools_smoke.py", line 50, in run_with_int
    or (rc == 137 and kill), "rc was %d" % rc)
AssertionError: rc was 137

----------------------------------------------------------------------

because:

# ./solisten.py 
Traceback (most recent call last):
  File "./solisten.py", line 22, in <module>
    import netaddr
ImportError: No module named netaddr

I'll file a ticket. I don't think it needs netaddr.

And:

# grep ucalls test_tools_smoke.py 
    def test_ucalls(self):
        self.run_with_int("ucalls.py -S $(pgrep -n python)")

I don't have any python running...

goldshtn · 2017-03-09T01:59:53Z

OK, so perhaps disable the solisten test for now. As for ucalls, not sure how you can not have Python running -- the test itself is a Python process. Or does it have a different comm?

goldshtn · 2017-03-10T19:35:30Z

Okay, so I'm going to disable the xfs tests because I don't see the required functions on the Ubuntu buildbot. I am also going to take the uflow test out because the Python version on Ubuntu doesn't have USDT probes. And finally I'm going to switch ucalls to use -S for syscall counting instead of tracing methods, for the same reason. Hopefully this will pass on all bots. Ideally though, we'd have a newer kernel on these buildbots so that we could run more tests... @drzaeus77

goldshtn · 2017-03-10T19:37:53Z

All right; pushed these changes. If this builds, please don't merge yet -- I'd like to rebase and clean up the last handful of commits :)

goldshtn · 2017-03-11T00:36:32Z

[buildbot, test this please]

4ast · 2017-03-11T16:42:16Z

can you squash all of the commits? I don't see much value keeping all the intermediate steps especially when most of them don't pass.

goldshtn · 2017-03-11T16:44:55Z

Yes, of course - once I get the whole thing to pass.

goldshtn · 2017-03-11T16:45:37Z

Uh-oh: @drzaeus77 the fc25 build bot can't find git again ..,

drzaeus77 · 2017-03-11T17:15:11Z

[buildbot, test this please]

This commit adds basic smoke tests for most tools in tools/ by running the tool with either a short duration, or interrupting it with a SIGINT after a short duration. The tests check the return value from the tool to detect any Python exceptions or other errors, but they do not read the standard error or standard output and parse the tool's result. Some tools are not covered by these smoke tests for reasons documented in the test itself: * btrfsdist and btrfsslower need btrfs * cachetop doesn't like to run without a terminal * dbslower, dbstat, and mysqld_qslower need a database engine * deadlock_detector allocates a huge amount of memory * softirqs doesn't work on new kernels and needs fixing (iovisor#1031) * ugc needs a USDT-enabled runtime with GC probes * zfsdist and zfsslower need zfs This is a good place to start, but clearly for some tools, especially those with a complex interface like trace and argdist, we need more than just basic smoke tests.

goldshtn · 2017-03-11T19:42:48Z

All righty -- cleaned up the history, if the build passes 🤞 let's get this merged.

goldshtn added 3 commits February 13, 2017 18:46

offwaketime: Add u+x permission

e0bcd3f

mdflush: Add missing #include <linux/bio.h>

d11179d

argdist: Exit with nonzero return code on error

f7ab443

4ast reviewed Mar 8, 2017

View reviewed changes

goldshtn force-pushed the tools-tests branch from f659c6b to 90b773d Compare March 10, 2017 15:53

goldshtn added 2 commits March 11, 2017 19:41

trace: Exit with nonzero return code on error

2febc29

goldshtn force-pushed the tools-tests branch from 386cafd to 5c41b39 Compare March 11, 2017 19:42

drzaeus77 merged commit dd3867d into iovisor:master Mar 11, 2017

Smoke tests for the tools #1032

Smoke tests for the tools #1032

Conversation

goldshtn commented Mar 7, 2017 • edited Loading

goldshtn commented Mar 7, 2017 • edited Loading

4ast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

goldshtn commented Mar 8, 2017

brendangregg commented Mar 8, 2017

goldshtn commented Mar 9, 2017

goldshtn commented Mar 10, 2017 • edited Loading

goldshtn commented Mar 10, 2017

goldshtn commented Mar 11, 2017

4ast commented Mar 11, 2017

goldshtn commented Mar 11, 2017

goldshtn commented Mar 11, 2017

drzaeus77 commented Mar 11, 2017

goldshtn commented Mar 11, 2017

goldshtn commented Mar 7, 2017 •

edited

Loading

goldshtn commented Mar 7, 2017 •

edited

Loading

goldshtn commented Mar 10, 2017 •

edited

Loading