Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new targeted error injection tool #1633

Merged
merged 4 commits into from
Mar 23, 2018

Conversation

hMcLauchlan
Copy link
Contributor

After writing a bunch of BPF scripts to error inject specific call paths with bpf_override_return(), I thought it'd be nice if a lot of the boilerplate could be reused and consolidated. Here's a BPF tool that should handle a fairly wide range of use cases for error injection with bpf_override_return().

inject.py constructs and runs a BPF program which takes a call chain and optional predicates as input, as well as a specified error mode. While the script is executing, if the call chain + predicates are met, the appropriate error will be injected into the chosen error mode.

A major caveat is that due to the nature of this tool, it's really easy to accidentally brick whatever you're running it on. So it's fairly imperative to make sure your input is correct or suddenly you find every kmalloc() on your system failing (I did this a lot).

I've also made an attempt to ensure correctness even in recursive situations. I didn't find any actual kernel examples, but made a contrived module for testing purposes to convince myself it was functional. I could probably get that organized in a presentable format if necessary, but neglected to include it here.

I had to work around some verifier/rewriter issues; this resulted in some slightly unconventional code:

  • the verifier insisted there was an unbounded minimum value error in the generated code, so redundant bounds checking was added in code generation.
  • rewriter did not transform field1->field2[0] into bpf_probe_read #1617 and a somewhat related issue with dereferencing and accessing together *(x->y). To get around this, I introduced a STRCMP macro to do some minor rewriting of my own. This macro bypasses both of those issues when using strings in predicates, but is not very robust.

For use cases and examples, see inject_example.txt

For details related to implementation, refer to the big comment block in inject.py.

Please let me know what you'd like changed.

Cheers,

bpf_override_return is a very powerful mechanism for error injection,
with the caveat that it requires whitelisting of the functions to be
overriden.

inject.py will take a call chain and optional set of predicates as
input, and inject the appropriate error when both the call chain and all
predicates are satisfied.

Signed-off-by: Howard McLauchlan <[email protected]>
@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

cc @brendangregg

@brendangregg
Copy link
Member

G'Day,

Thanks, should be a useful tool, and great to see a tool for this 4.16 feature already! Some suggestions:

  • man page should at line 10 have a very clear WARNING line (grep other man pages for WARNING) that doesn't pull punches. Eg, "WARNING: This tool injects failures, and those failures may crash the kernel. You must know what you are doing to use this tool.".

  • _example.txt file should end with the full USAGE message for reference.

  • Can you start with a simpler example in _example.txt? Eg (this may not work):

# ./inject.py kmalloc 'sys_execve()'
  • I don't see why we need to define the arguments if we're not using them.

  • I think the predicate "(true)" should be optional -- that should be the default if none is provided.

  • The "<-" syntax isn't intuitive for me. Can it just be new lines? Or the word "from"? Here's a few examples.

Instead of:

# ./inject.py kmalloc -I 'linux/mm.h' -I 'linux/fs.h' -v '(true)
    mount_subtree(struct vfsmount *mnt, const char *name) (true)
    btrfs_mount(struct file_system_type *fs_type, int flags, const char *device_name, void *data) (true)
'

(note that I adjusted whitespace because it was hard to read on one line!) Imagine this instead:

# ./inject.py kmalloc -v '
    mount_subtree()
    btrfs_mount()
'

Or (if the "->" symbol was really necessary for parsing):

# ./inject.py kmalloc -v '
    from mount_subtree()
    from btrfs_mount()
'

Or even this:

# ./inject.py kmalloc -v '
    => mount_subtree()
    => btrfs_mount()
'

At least that looks like something I've seen before -- an ftrace stack trace! :)

  • How about failure percentage? Imagine a:
# ./inject kmalloc -P 0.1 sys_execve

To fail at 0.1%. This could be added later on as an extra feature. Or maybe it can already be done with a predicate?

  • One of the kmalloc examples says it caused a segfault, because of a BUG_ON. I think this description needs to be expanded a little, otherwise readers might think that this tool is injecting segfaults. Ie, explain that this tool is just injecting an error return value, but the code later on will segfault because it trips a BUG_ON, but the tool itself is just injecting the error return not the segfault. It's a good example anyway, because it lets the reader know that segfaults are possible.

  • There's some whitespace inconsistencies in the example file, eg, "(true)<-" and "(true) <-". It stands out because I'm trying to learn the new syntax, so I'm focusing on it. I think that can be changed as described earlier anyway.

@hMcLauchlan
Copy link
Contributor Author

hMcLauchlan commented Mar 21, 2018

Thanks for the quick feedback, I just want to talk through some of these points before I push an update:

  • The predicate "(true)" is already optional, and I think I mention that in _example.txt. Is there a more explicit place I should put that?
  • "<-" syntax definitely isn't necessary. If newlines is preferred I can turn twnety~ lines of annoying parsing into one, so it's a win for everyone.
  • Not quite sure off the top of my head how to implement failure percentages. Do you have suggestions?
  • The whitespace inconsistencies were intentional to illustrate tolerance of imperfectly formatted input, in hindsight it does look a little jarring so will remove (and won't be relevant once syntax changes).

The rest I will fix and push soon.

Cheers,

@brendangregg
Copy link
Member

brendangregg commented Mar 21, 2018

The failure percentages can be added later, using the bpf random function. Kernel support was added so that tracing programs can use it, but I don't think we have an example tool yet. For a 0-99 random number, I think it works like this: bpf_get_prandom_u32() % 100

The first example is looking much better:

# ./inject.py kmalloc -v -I 'linux/fs.h' 'do_mount()'

But I'm still wondering why we need to -I anything, and what of linux/fs.h is used there?

Here's a guess: if we forced a #include of linux/mm.h by default (for the kmalloc failure mode), would that mean we need a lot fewer -I's? Eg:

    def _generate_program(self):
        # leave out auto includes for now

        self.program += '#include <linux/mm.h>\n'
        for include in (self.args.include or []):
            self.program += "#include <%s>\n" % include
[...]

Would this then work as the hello world example?

# ./inject.py kmalloc 'do_mount()'

@brendangregg
Copy link
Member

Looks good. Does this work? (multiline)

# ./inject.py kmalloc -v 'btrfs_alloc_device()
    => btrfs_close_devices()'

@hMcLauchlan
Copy link
Contributor Author

Yep, works fine.

@brendangregg
Copy link
Member

thanks!

@brendangregg brendangregg merged commit ddd5dd5 into iovisor:master Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants