Add new targeted error injection tool #1633

hMcLauchlan · 2018-03-17T01:49:17Z

After writing a bunch of BPF scripts to error inject specific call paths with bpf_override_return(), I thought it'd be nice if a lot of the boilerplate could be reused and consolidated. Here's a BPF tool that should handle a fairly wide range of use cases for error injection with bpf_override_return().

inject.py constructs and runs a BPF program which takes a call chain and optional predicates as input, as well as a specified error mode. While the script is executing, if the call chain + predicates are met, the appropriate error will be injected into the chosen error mode.

A major caveat is that due to the nature of this tool, it's really easy to accidentally brick whatever you're running it on. So it's fairly imperative to make sure your input is correct or suddenly you find every kmalloc() on your system failing (I did this a lot).

I've also made an attempt to ensure correctness even in recursive situations. I didn't find any actual kernel examples, but made a contrived module for testing purposes to convince myself it was functional. I could probably get that organized in a presentable format if necessary, but neglected to include it here.

I had to work around some verifier/rewriter issues; this resulted in some slightly unconventional code:

the verifier insisted there was an unbounded minimum value error in the generated code, so redundant bounds checking was added in code generation.
rewriter did not transform field1->field2[0] into bpf_probe_read #1617 and a somewhat related issue with dereferencing and accessing together *(x->y). To get around this, I introduced a STRCMP macro to do some minor rewriting of my own. This macro bypasses both of those issues when using strings in predicates, but is not very robust.

For use cases and examples, see inject_example.txt

For details related to implementation, refer to the big comment block in inject.py.

Please let me know what you'd like changed.

Cheers,

bpf_override_return is a very powerful mechanism for error injection, with the caveat that it requires whitelisting of the functions to be overriden. inject.py will take a call chain and optional set of predicates as input, and inject the appropriate error when both the call chain and all predicates are satisfied. Signed-off-by: Howard McLauchlan <[email protected]>

yonghong-song · 2018-03-19T06:12:16Z

[buildbot, ok to test]

yonghong-song · 2018-03-19T06:13:07Z

cc @brendangregg

brendangregg · 2018-03-19T20:38:25Z

G'Day,

Thanks, should be a useful tool, and great to see a tool for this 4.16 feature already! Some suggestions:

man page should at line 10 have a very clear WARNING line (grep other man pages for WARNING) that doesn't pull punches. Eg, "WARNING: This tool injects failures, and those failures may crash the kernel. You must know what you are doing to use this tool.".
_example.txt file should end with the full USAGE message for reference.
Can you start with a simpler example in _example.txt? Eg (this may not work):

# ./inject.py kmalloc 'sys_execve()'

I don't see why we need to define the arguments if we're not using them.
I think the predicate "(true)" should be optional -- that should be the default if none is provided.
The "<-" syntax isn't intuitive for me. Can it just be new lines? Or the word "from"? Here's a few examples.

Instead of:

# ./inject.py kmalloc -I 'linux/mm.h' -I 'linux/fs.h' -v '(true)
    mount_subtree(struct vfsmount *mnt, const char *name) (true)
    btrfs_mount(struct file_system_type *fs_type, int flags, const char *device_name, void *data) (true)
'

(note that I adjusted whitespace because it was hard to read on one line!) Imagine this instead:

# ./inject.py kmalloc -v '
    mount_subtree()
    btrfs_mount()
'

Or (if the "->" symbol was really necessary for parsing):

# ./inject.py kmalloc -v '
    from mount_subtree()
    from btrfs_mount()
'

Or even this:

# ./inject.py kmalloc -v '
    => mount_subtree()
    => btrfs_mount()
'

At least that looks like something I've seen before -- an ftrace stack trace! :)

How about failure percentage? Imagine a:

# ./inject kmalloc -P 0.1 sys_execve

To fail at 0.1%. This could be added later on as an extra feature. Or maybe it can already be done with a predicate?

One of the kmalloc examples says it caused a segfault, because of a BUG_ON. I think this description needs to be expanded a little, otherwise readers might think that this tool is injecting segfaults. Ie, explain that this tool is just injecting an error return value, but the code later on will segfault because it trips a BUG_ON, but the tool itself is just injecting the error return not the segfault. It's a good example anyway, because it lets the reader know that segfaults are possible.
There's some whitespace inconsistencies in the example file, eg, "(true)<-" and "(true) <-". It stands out because I'm trying to learn the new syntax, so I'm focusing on it. I think that can be changed as described earlier anyway.

hMcLauchlan · 2018-03-21T06:29:32Z

Thanks for the quick feedback, I just want to talk through some of these points before I push an update:

The predicate "(true)" is already optional, and I think I mention that in _example.txt. Is there a more explicit place I should put that?
"<-" syntax definitely isn't necessary. If newlines is preferred I can turn twnety~ lines of annoying parsing into one, so it's a win for everyone.
Not quite sure off the top of my head how to implement failure percentages. Do you have suggestions?
The whitespace inconsistencies were intentional to illustrate tolerance of imperfectly formatted input, in hindsight it does look a little jarring so will remove (and won't be relevant once syntax changes).

The rest I will fix and push soon.

Cheers,

brendangregg · 2018-03-21T23:11:06Z

The failure percentages can be added later, using the bpf random function. Kernel support was added so that tracing programs can use it, but I don't think we have an example tool yet. For a 0-99 random number, I think it works like this: bpf_get_prandom_u32() % 100

The first example is looking much better:

# ./inject.py kmalloc -v -I 'linux/fs.h' 'do_mount()'

But I'm still wondering why we need to -I anything, and what of linux/fs.h is used there?

Here's a guess: if we forced a #include of linux/mm.h by default (for the kmalloc failure mode), would that mean we need a lot fewer -I's? Eg:

    def _generate_program(self):
        # leave out auto includes for now

        self.program += '#include <linux/mm.h>\n'
        for include in (self.args.include or []):
            self.program += "#include <%s>\n" % include
[...]

Would this then work as the hello world example?

# ./inject.py kmalloc 'do_mount()'

brendangregg · 2018-03-22T18:48:15Z

Looks good. Does this work? (multiline)

# ./inject.py kmalloc -v 'btrfs_alloc_device()
    => btrfs_close_devices()'

hMcLauchlan · 2018-03-22T22:21:07Z

Yep, works fine.

brendangregg · 2018-03-23T17:14:20Z

thanks!

hMcLauchlan requested review from 4ast, brendangregg, drzaeus77, goldshtn and yonghong-song as code owners March 17, 2018 01:49

Howard McLauchlan added 2 commits March 21, 2018 15:29

Change syntax

2688234

Update man page with warning, _example.txt with updated syntax

ac7c154

automatically include mm.h, remove redundant includes from _example.txt

601d75d

brendangregg merged commit ddd5dd5 into iovisor:master Mar 23, 2018

hMcLauchlan mentioned this pull request Apr 10, 2018

Add features/docs for inject (4.17) #1676

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new targeted error injection tool #1633

Add new targeted error injection tool #1633

hMcLauchlan commented Mar 17, 2018

yonghong-song commented Mar 19, 2018

yonghong-song commented Mar 19, 2018

brendangregg commented Mar 19, 2018

hMcLauchlan commented Mar 21, 2018 •

edited

Loading

brendangregg commented Mar 21, 2018 •

edited

Loading

brendangregg commented Mar 22, 2018

hMcLauchlan commented Mar 22, 2018

brendangregg commented Mar 23, 2018

Add new targeted error injection tool #1633

Add new targeted error injection tool #1633

Conversation

hMcLauchlan commented Mar 17, 2018

yonghong-song commented Mar 19, 2018

yonghong-song commented Mar 19, 2018

brendangregg commented Mar 19, 2018

hMcLauchlan commented Mar 21, 2018 • edited Loading

brendangregg commented Mar 21, 2018 • edited Loading

brendangregg commented Mar 22, 2018

hMcLauchlan commented Mar 22, 2018

brendangregg commented Mar 23, 2018

hMcLauchlan commented Mar 21, 2018 •

edited

Loading

brendangregg commented Mar 21, 2018 •

edited

Loading