Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbpf-tools: add klockstat #3688

Merged
merged 1 commit into from
Jan 6, 2022
Merged

libbpf-tools: add klockstat #3688

merged 1 commit into from
Jan 6, 2022

Conversation

brho
Copy link
Contributor

@brho brho commented Nov 3, 2021

This is a port of BCC's klockstat. Differences from BCC:

  • can specify a lock by ksym name, using -L
  • tracks whichever task had the max time for acquire and hold, outputted
    when -s > 1 (otherwise it's cluttered).
  • does not reset stats each interval by default. Can request with -R.

Usage: klockstat [-hRT] [-p PID] [-t TID] [-c FUNC] [-L LOCK] [-n NR_LOCKS]
[-s NR_STACKS] [-S SORT] [-d DURATION] [-i INTERVAL]

-p, --pid=PID Filter by process ID
-t, --tid=TID Filter by thread ID
-c, --caller=FUNC Filter by caller string prefix
-L, --lock=LOCK Filter by specific ksym lock name
-n, --locks=NR_LOCKS Number of locks to print
-s, --stacks=NR_STACKS Number of stack entries to print per lock
-S, --sort=SORT Sort by field:
acq_[max|total|count]
hld_[max|total|count]
-d, --duration=SECONDS Duration to trace
-i, --interval=SECONDS Print interval
-R, --reset Reset stats each interval
-T, --timestamp Print timestamp

-?, --help Give this help list
--usage Give a short usage message
-V, --version Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Examples:
klockstat # trace system wide until ctrl-c
klockstat -d 5 # trace for 5 seconds
klockstat -i 5 # print stats every 5 seconds
klockstat -p 181 # trace process 181 only
klockstat -t 181 # trace thread 181 only
klockstat -c pipe_ # print only for lock callers with 'pipe_'
# prefix
klockstat -L cgroup_mutex # trace the cgroup_mutex lock only
klockstat -S acq_count # sort lock acquired results by acquire count
klockstat -S hld_total # sort lock held results by total held time
klockstat -S acq_count,hld_total # combination of above
klockstat -n 3 # display top 3 locks
klockstat -s 6 # display 6 stack entries per lock


Signed-off-by: Barret Rhoden [email protected]

libbpf-tools/klockstat.bpf.c Outdated Show resolved Hide resolved
libbpf-tools/klockstat.bpf.c Outdated Show resolved Hide resolved
libbpf-tools/klockstat.bpf.c Outdated Show resolved Hide resolved
@davemarchevsky
Copy link
Collaborator

[buildbot, test this please]

@davemarchevsky
Copy link
Collaborator

[buildbot, ok to test]

@brho
Copy link
Contributor Author

brho commented Nov 18, 2021

just checking if there's anything else needed on this PR.

@chenhengqi
Copy link
Collaborator

just checking if there's anything else needed on this PR.

Sorry, I missed the GitHub notification. Will take a look at this again this week.

@brho
Copy link
Contributor Author

brho commented Dec 9, 2021

hi -

two updates on this in my recent change (9 Dec):

  1. instead of tracking 'lock depth' as a lookup (which was interpreted as "task X's Nth lock acquired"), do lookups by task_id + lock_ptr, which is "Task X grabbing lock Y".

the biggest benefit is that we don't have "mismatched" lock/unlock pairs. so if a task grabs mutex A, then mutex B, it can unlock A before unlocking B. if you tracked by "depth", you'd just assume the first unlocked was the last to be locked.

an ancillary benefit is we don't have to worry about having unlock get paired with "ancient" locks, which means we don't need that "enabled" flag, or the bulk of comments justifying it.

finally, it cleans up the code a bit: one less MAP to maintain, since we no longer care about a task's "depth".

  1. use atomics or other shared memory sync when accessing a lockstat structure. these are keyed by a callstack. although callstacks identify a line of code, e.g. vfs_read+0x12, they do not uniquely identify a lock. so you can have multiple threads working on the same struct in account(). i put the WRITE_ONCE/READ_ONCE in bits.bpf.h, but that could be moved elsewhere.

thanks!

libbpf-tools/klockstat.h Outdated Show resolved Hide resolved
libbpf-tools/klockstat.h Outdated Show resolved Hide resolved
libbpf-tools/klockstat.bpf.c Show resolved Hide resolved
libbpf-tools/klockstat.c Outdated Show resolved Hide resolved
libbpf-tools/klockstat.c Outdated Show resolved Hide resolved
libbpf-tools/klockstat.c Outdated Show resolved Hide resolved
This is a port of BCC's klockstat.  Differences from BCC:
- can specify a lock by ksym name, using -L
- tracks whichever task had the max time for acquire and hold, outputted
when -s > 1 (otherwise it's cluttered).
- does not reset stats each interval by default.  Can request with -R.

-------------
Usage: klockstat [-hRT] [-p PID] [-t TID] [-c FUNC] [-L LOCK] [-n NR_LOCKS]
                 [-s NR_STACKS] [-S SORT] [-d DURATION] [-i INTERVAL]

  -p, --pid=PID              Filter by process ID
  -t, --tid=TID              Filter by thread ID
  -c, --caller=FUNC          Filter by caller string prefix
  -L, --lock=LOCK            Filter by specific ksym lock name
  -n, --locks=NR_LOCKS       Number of locks to print
  -s, --stacks=NR_STACKS     Number of stack entries to print per lock
  -S, --sort=SORT            Sort by field:
                               acq_[max|total|count]
                               hld_[max|total|count]
  -d, --duration=SECONDS     Duration to trace
  -i, --interval=SECONDS     Print interval
  -R, --reset                Reset stats each interval
  -T, --timestamp            Print timestamp

  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Examples:
  klockstat                     # trace system wide until ctrl-c
  klockstat -d 5                # trace for 5 seconds
  klockstat -i 5                # print stats every 5 seconds
  klockstat -p 181              # trace process 181 only
  klockstat -t 181              # trace thread 181 only
  klockstat -c pipe_            # print only for lock callers with 'pipe_'
                                # prefix
  klockstat -L cgroup_mutex     # trace the cgroup_mutex lock only
  klockstat -S acq_count        # sort lock acquired results by acquire count
  klockstat -S hld_total        # sort lock held results by total held time
  klockstat -S acq_count,hld_total  # combination of above
  klockstat -n 3                # display top 3 locks
  klockstat -s 6                # display 6 stack entries per lock

-------------

Signed-off-by: Barret Rhoden <[email protected]>
@brho
Copy link
Contributor Author

brho commented Dec 10, 2021

pushed fixes. thanks!

@brho
Copy link
Contributor Author

brho commented Jan 4, 2022

just checking if this needs anything else. thanks.

@davemarchevsky davemarchevsky self-assigned this Jan 6, 2022
@davemarchevsky davemarchevsky merged commit 98b47c7 into iovisor:master Jan 6, 2022
davemarchevsky added a commit to davemarchevsky/bcc that referenced this pull request Jan 7, 2022
The PR adding the libbpf-tools port of klockstat was sitting in a
mergeable state for some time. Meanwhile, libbpf stopped exposing
rlimit_memlock bumping API and now does the rlimit bump automatically if
necessary. So remove the bump_rlimit_memlock call and set libbpf strict
mode for this tool.

Also, add a comment (from @brho's PR summary in iovisor#3688) detailing the
differences in default behavior between the libbpf-tools and bcc-python
versions.

Signed-off-by: Dave Marchevsky <[email protected]>
@davemarchevsky
Copy link
Collaborator

Sorry for the merge delay here.

I would normally push back against the change to default behavior here, but I can see why it makes sense and am not an active user of klockstat so don't think it's a good idea to hold this up further.

Did a quick followup PR (#3796) adding "Differences from BCC" comment.

CrackerCat pushed a commit to CrackerCat/bcc that referenced this pull request Jul 31, 2024
The PR adding the libbpf-tools port of klockstat was sitting in a
mergeable state for some time. Meanwhile, libbpf stopped exposing
rlimit_memlock bumping API and now does the rlimit bump automatically if
necessary. So remove the bump_rlimit_memlock call and set libbpf strict
mode for this tool.

Also, add a comment (from @brho's PR summary in iovisor#3688) detailing the
differences in default behavior between the libbpf-tools and bcc-python
versions.

Signed-off-by: Dave Marchevsky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants