Skip to content

Commit

Permalink
Merge tag 'probes-v6.9' of git:https://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:
 "x86 kprobes:

   - Use boolean for some function return instead of 0 and 1

   - Prohibit probing on INT/UD. This prevents user to put kprobe on
     INTn/INT1/INT3/INTO and UD0/UD1/UD2 because these are used for a
     special purpose in the kernel

   - Boost Grp instructions. Because a few percent of kernel
     instructions are Grp 2/3/4/5 and those are safe to be executed
     without ip register fixup, allow those to be boosted (direct
     execution on the trampoline buffer with a JMP)

  tracing:

   - Add function argument access from return events (kretprobe and
     fprobe). This allows user to compare how a data structure field is
     changed after executing a function. With BTF, return event also
     accepts function argument access by name.

   - Fix a wrong comment (using "Kretprobe" in fprobe)

   - Cleanup a big probe argument parser function into three parts, type
     parser, post-processing function, and main parser

   - Cleanup to set nr_args field when initializing trace_probe instead
     of counting up it while parsing

   - Cleanup a redundant #else block from tracefs/README source code

   - Update selftests to check entry argument access from return probes

   - Documentation update about entry argument access from return
     probes"

* tag 'probes-v6.9' of git:https://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Add entry argument access at function exit
  selftests/ftrace: Add test cases for entry args at function exit
  tracing/probes: Support $argN in return probe (kprobe and fprobe)
  tracing: Remove redundant #else block for BTF args from README
  tracing/probes: cleanup: Set trace_probe::nr_args at trace_probe_init
  tracing/probes: Cleanup probe argument parser
  tracing/fprobe-event: cleanup: Fix a wrong comment in fprobe event
  x86/kprobes: Boost more instructions from grp2/3/4/5
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: Refactor can_{probe,boost} return type to bool
  • Loading branch information
torvalds committed Mar 14, 2024
2 parents c0a614e + e8c32f2 commit 0173275
Show file tree
Hide file tree
Showing 16 changed files with 584 additions and 199 deletions.
31 changes: 31 additions & 0 deletions Documentation/trace/fprobetrace.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ Synopsis of fprobe-events

For the details of TYPE, see :ref:`kprobetrace documentation <kprobetrace_types>`.

Function arguments at exit
--------------------------
Function arguments can be accessed at exit probe using $arg<N> fetcharg. This
is useful to record the function parameter and return value at once, and
trace the difference of structure fields (for debuging a function whether it
correctly updates the given data structure or not)
See the :ref:`sample<fprobetrace_exit_args_sample>` below for how it works.

BTF arguments
-------------
BTF (BPF Type Format) argument allows user to trace function and tracepoint
Expand Down Expand Up @@ -218,3 +226,26 @@ traceprobe event, you can trace that field as below.
<idle>-0 [000] d..3. 5606.690317: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000
kworker/0:1-14 [000] d..3. 5606.690339: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="swapper/0" usage=2 start_time=0
<idle>-0 [000] d..3. 5606.692368: sched_switch: (__probestub_sched_switch+0x4/0x10) comm="kworker/0:1" usage=1 start_time=137000000

.. _fprobetrace_exit_args_sample:

The return probe allows us to access the results of some functions, which returns
the error code and its results are passed via function parameter, such as an
structure-initialization function.

For example, vfs_open() will link the file structure to the inode and update
mode. You can trace that changes with return probe.
::

# echo 'f vfs_open mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
# echo 'f vfs_open%%return mode=file->f_mode:x32 inode=file->f_inode:x64' >> dynamic_events
# echo 1 > events/fprobes/enable
# cat trace
sh-131 [006] ...1. 1945.714346: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x2 inode=0x0
sh-131 [006] ...1. 1945.714358: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4d801e inode=0xffff888008470168
cat-143 [007] ...1. 1945.717949: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
cat-143 [007] ...1. 1945.717956: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0x4a801d inode=0xffff888005f78d28
cat-143 [007] ...1. 1945.720616: vfs_open__entry: (vfs_open+0x4/0x40) mode=0x1 inode=0x0
cat-143 [007] ...1. 1945.728263: vfs_open__exit: (do_open+0x274/0x3d0 <- vfs_open) mode=0xa800d inode=0xffff888004ada8d8

You can see the `file::f_mode` and `file::f_inode` are upated in `vfs_open()`.
9 changes: 9 additions & 0 deletions Documentation/trace/kprobetrace.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,15 @@ Synopsis of kprobe_events
(\*3) this is useful for fetching a field of data structures.
(\*4) "u" means user-space dereference. See :ref:`user_mem_access`.

Function arguments at kretprobe
-------------------------------
Function arguments can be accessed at kretprobe using $arg<N> fetcharg. This
is useful to record the function parameter and return value at once, and
trace the difference of structure fields (for debuging a function whether it
correctly updates the given data structure or not).
See the :ref:`sample<fprobetrace_exit_args_sample>` in fprobe event for how
it works.

.. _kprobetrace_types:

Types
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/kprobes/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
#endif

/* Ensure if the instruction can be boostable */
extern int can_boost(struct insn *insn, void *orig_addr);
extern bool can_boost(struct insn *insn, void *orig_addr);
/* Recover instruction if given address is probed */
extern unsigned long recover_probed_instruction(kprobe_opcode_t *buf,
unsigned long addr);
Expand Down
98 changes: 68 additions & 30 deletions arch/x86/kernel/kprobes/core.c
Original file line number Diff line number Diff line change
Expand Up @@ -137,30 +137,30 @@ NOKPROBE_SYMBOL(synthesize_relcall);
* Returns non-zero if INSN is boostable.
* RIP relative instructions are adjusted at copying time in 64 bits mode
*/
int can_boost(struct insn *insn, void *addr)
bool can_boost(struct insn *insn, void *addr)
{
kprobe_opcode_t opcode;
insn_byte_t prefix;
int i;

if (search_exception_tables((unsigned long)addr))
return 0; /* Page fault may occur on this address. */
return false; /* Page fault may occur on this address. */

/* 2nd-byte opcode */
if (insn->opcode.nbytes == 2)
return test_bit(insn->opcode.bytes[1],
(unsigned long *)twobyte_is_boostable);

if (insn->opcode.nbytes != 1)
return 0;
return false;

for_each_insn_prefix(insn, i, prefix) {
insn_attr_t attr;

attr = inat_get_opcode_attribute(prefix);
/* Can't boost Address-size override prefix and CS override prefix */
if (prefix == 0x2e || inat_is_address_size_prefix(attr))
return 0;
return false;
}

opcode = insn->opcode.bytes[0];
Expand All @@ -169,24 +169,35 @@ int can_boost(struct insn *insn, void *addr)
case 0x62: /* bound */
case 0x70 ... 0x7f: /* Conditional jumps */
case 0x9a: /* Call far */
case 0xc0 ... 0xc1: /* Grp2 */
case 0xcc ... 0xce: /* software exceptions */
case 0xd0 ... 0xd3: /* Grp2 */
case 0xd6: /* (UD) */
case 0xd8 ... 0xdf: /* ESC */
case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
case 0xe8 ... 0xe9: /* near Call, JMP */
case 0xeb: /* Short JMP */
case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
/* ... are not boostable */
return false;
case 0xc0 ... 0xc1: /* Grp2 */
case 0xd0 ... 0xd3: /* Grp2 */
/*
* AMD uses nnn == 110 as SHL/SAL, but Intel makes it reserved.
*/
return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b110;
case 0xf6 ... 0xf7: /* Grp3 */
/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b001;
case 0xfe: /* Grp4 */
/* ... are not boostable */
return 0;
/* Only INC and DEC are boostable */
return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001;
case 0xff: /* Grp5 */
/* Only indirect jmp is boostable */
return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
/* Only INC, DEC, and indirect JMP are boostable */
return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001 ||
X86_MODRM_REG(insn->modrm.bytes[0]) == 0b100;
default:
return 1;
return true;
}
}

Expand Down Expand Up @@ -252,21 +263,40 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
return __recover_probed_insn(buf, addr);
}

/* Check if paddr is at an instruction boundary */
static int can_probe(unsigned long paddr)
/* Check if insn is INT or UD */
static inline bool is_exception_insn(struct insn *insn)
{
/* UD uses 0f escape */
if (insn->opcode.bytes[0] == 0x0f) {
/* UD0 / UD1 / UD2 */
return insn->opcode.bytes[1] == 0xff ||
insn->opcode.bytes[1] == 0xb9 ||
insn->opcode.bytes[1] == 0x0b;
}

/* INT3 / INT n / INTO / INT1 */
return insn->opcode.bytes[0] == 0xcc ||
insn->opcode.bytes[0] == 0xcd ||
insn->opcode.bytes[0] == 0xce ||
insn->opcode.bytes[0] == 0xf1;
}

/*
* Check if paddr is at an instruction boundary and that instruction can
* be probed
*/
static bool can_probe(unsigned long paddr)
{
unsigned long addr, __addr, offset = 0;
struct insn insn;
kprobe_opcode_t buf[MAX_INSN_SIZE];

if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
return 0;
return false;

/* Decode instructions */
addr = paddr - offset;
while (addr < paddr) {
int ret;

/*
* Check if the instruction has been modified by another
* kprobe, in which case we replace the breakpoint by the
Expand All @@ -277,11 +307,10 @@ static int can_probe(unsigned long paddr)
*/
__addr = recover_probed_instruction(buf, addr);
if (!__addr)
return 0;
return false;

ret = insn_decode_kernel(&insn, (void *)__addr);
if (ret < 0)
return 0;
if (insn_decode_kernel(&insn, (void *)__addr) < 0)
return false;

#ifdef CONFIG_KGDB
/*
Expand All @@ -290,10 +319,26 @@ static int can_probe(unsigned long paddr)
*/
if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
kgdb_has_hit_break(addr))
return 0;
return false;
#endif
addr += insn.length;
}

/* Check if paddr is at an instruction boundary */
if (addr != paddr)
return false;

__addr = recover_probed_instruction(buf, addr);
if (!__addr)
return false;

if (insn_decode_kernel(&insn, (void *)__addr) < 0)
return false;

/* INT and UD are special and should not be kprobed */
if (is_exception_insn(&insn))
return false;

if (IS_ENABLED(CONFIG_CFI_CLANG)) {
/*
* The compiler generates the following instruction sequence
Expand All @@ -308,13 +353,6 @@ static int can_probe(unsigned long paddr)
* Also, these movl and addl are used for showing expected
* type. So those must not be touched.
*/
__addr = recover_probed_instruction(buf, addr);
if (!__addr)
return 0;

if (insn_decode_kernel(&insn, (void *)__addr) < 0)
return 0;

if (insn.opcode.value == 0xBA)
offset = 12;
else if (insn.opcode.value == 0x3)
Expand All @@ -324,11 +362,11 @@ static int can_probe(unsigned long paddr)

/* This movl/addl is used for decoding CFI. */
if (is_cfi_trap(addr + offset))
return 0;
return false;
}

out:
return (addr == paddr);
return true;
}

/* If x86 supports IBT (ENDBR) it must be skipped. */
Expand Down
5 changes: 2 additions & 3 deletions kernel/trace/trace.c
Original file line number Diff line number Diff line change
Expand Up @@ -5747,16 +5747,15 @@ static const char readme_msg[] =
"\t args: <name>=fetcharg[:type]\n"
"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
"\t <argname>[->field[->field|.field...]],\n"
#else
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
#endif
#else
"\t $stack<index>, $stack, $retval, $comm,\n"
#endif
"\t +|-[u]<offset>(<fetcharg>), \\imm-value, \\\"imm-string\"\n"
"\t kernel return probes support: $retval, $arg<N>, $comm\n"
"\t type: s8/16/32/64, u8/16/32/64, x8/16/32/64, char, string, symbol,\n"
"\t b<bit-width>@<bit-offset>/<container-size>, ustring,\n"
"\t symstr, <type>\\[<array-size>\\]\n"
Expand Down
8 changes: 4 additions & 4 deletions kernel/trace/trace_eprobe.c
Original file line number Diff line number Diff line change
Expand Up @@ -220,7 +220,7 @@ static struct trace_eprobe *alloc_event_probe(const char *group,
if (!ep->event_system)
goto error;

ret = trace_probe_init(&ep->tp, this_event, group, false);
ret = trace_probe_init(&ep->tp, this_event, group, false, nargs);
if (ret < 0)
goto error;

Expand Down Expand Up @@ -390,8 +390,8 @@ static int get_eprobe_size(struct trace_probe *tp, void *rec)

/* Note that we don't verify it, since the code does not come from user space */
static int
process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
void *base)
process_fetch_insn(struct fetch_insn *code, void *rec, void *edata,
void *dest, void *base)
{
unsigned long val;
int ret;
Expand Down Expand Up @@ -438,7 +438,7 @@ __eprobe_trace_func(struct eprobe_data *edata, void *rec)
return;

entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
store_trace_args(&entry[1], &edata->ep->tp, rec, sizeof(*entry), dsize);
store_trace_args(&entry[1], &edata->ep->tp, rec, NULL, sizeof(*entry), dsize);

trace_event_buffer_commit(&fbuffer);
}
Expand Down
Loading

0 comments on commit 0173275

Please sign in to comment.