Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add average function latency #3061

Merged
merged 2 commits into from
Sep 28, 2020
Merged

Conversation

NPC51129
Copy link
Contributor

the tool funclatency does not provide average latency of function traced.
add avg latency and output like this:

# ./funclatency.py -u do_sys_open
Tracing 1 functions for "do_sys_open"... Hit Ctrl-C to end.
^C     usecs               : count     distribution
         0 -> 1          : 91       |*************************************   |
         2 -> 3          : 32       |*************                           |
         4 -> 7          : 98       |****************************************|
         8 -> 15         : 3        |*                                       |
        16 -> 31         : 1        |                                        |
avg = 3 usecs, total: 773 usecs, count: 225

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@brendangregg
Copy link
Member

Can you please run a stress test and estimate the overhead of this addition?

If the overhead is at all measurable, I'd rather see this functionality moved to a "-v" for verbose option. If it's negligible, then perhaps we can keep it as the default.

FWIW, a lot of tools are designed to give basic stats to both minimize overhead and visual clutter.

@netedwardwu
Copy link
Contributor

The total value is really useful for real-world problems.
I added this total value and used it a lot in my performance tuning.
Sometimes log2histogram is not enough for very detailed profiling (2~5% performance drop).
Without the total value, we will suspect the problem is not related to the function we profiling.
And waste so much time on other wrong ways.

With the total value, we can figure out a 2~5% performance drop is because of a very detailed difference.

@brendangregg
Copy link
Member

I understand its value, but not the cost. In this and other tools I deliberately leave out extra stuff in the default output so that end users can run the tool with minimum overhead in production, and then choose to have extended output with more cost via command line options. funccount(8) in particular can be aimed at very frequent functions, executed millions of times per second.

But instead of a "-v" for verbose mode, you're adding it to the default output, so we don't have a choice and pay the cost always. That might be ok if the cost was negligible. But you're going to have to do performance analysis of the performance tool to show what it is.

@brendangregg
Copy link
Member

BTW, personally I'd try a few things, like dd /dev/zero to /dev/null with a block size of 1, without the tool running, with the old version running (instrumenting, say, vfs_read()), and with the new version running. dd prints throughput metrics, aiding comparisons.

@NPC51129
Copy link
Contributor Author

I tested the tool with dd from /dev/zero to /dev/null and bs=1 as suggested by @brendangregg
a) without tool: 895 kB/s
b) with old version: 582 kB/s
c) with new version: 582 kB/s
so the overhead is negligible.
my environment is
kernel version 4.19.91-19.1.al7.x86_64
cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

@NPC51129
Copy link
Contributor Author

Hi, I did a few more tests on this. As the new feature introduces 'lock_xadd' in code, the performance of tool might be impacted.
Test 1
a) run old version tool on multiple cores , and dd if=/dev/zero of=/dev/null bs=1 for 10 secs;
b) run new version tool on multi cores, and dd if=/dev/zero of=/dev/null bs=1 for 10 secs.
The results:

No. of threads dd tp a) [kB/s] dd tp b) [kB/s]
1 597 593
2 503 503
4 381 375
8 262 256
16 177 175
32 99.8 99.3
64 44.8 44.6

In this test the overhead of the new version versus the old version is negligible. Nevertheless I don't think we would run multiple threads of funclatency in practice?

Test 2
a) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores.
b) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores and old version funclatency.
c) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores and new version funclatency.
The statistics printed by dd command:

No. of threads dd tp a) [kB/s] dd tp b) [kB/s] dd tp c) [kB/s]
1 898 613 610
2 598 407 407
4 595 357 354
8 592 201 195
16 586 48.5 57.6
32 585 13.7 15.7
64 573 3.2 3.1

In this test the overhead of new version vs. old version is negligible as well.
Overall, adding average function latency to this tool introduces negligible overhead.

@brendangregg
Copy link
Member

Great testing, thanks. So it looks like the overhead is negligible (I was expecting more given how much overhead this costs in the first place: which you can see by comparing the Test 2 (a) and (b) columns).

It is a touch annoying to change the default output of tools (as it breaks documentation; adding options doesn't typically break documentation) and to clutter it a little, but the benefits probably outweigh that, so I'm happy to merge.

@brendangregg
Copy link
Member

Can you please update the _example.txt file examples? (see what I mean about breaking documentation!)

Perhaps the quickest way (rather than redoing all the examples) is to take each output and then add a fake summary line based on your best guess from the distribution data. Not ideal, but maybe good enough for the example.

E.g.:

# ./funclatency do_sys_open
Tracing do_sys_open... Hit Ctrl-C to end.
^C
     nsecs           : count     distribution
       0 -> 1        : 0        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 0        |                                      |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 0        |                                      |
      64 -> 127      : 0        |                                      |
     128 -> 255      : 0        |                                      |
     256 -> 511      : 0        |                                      |
     512 -> 1023     : 0        |                                      |
    1024 -> 2047     : 0        |                                      |
    2048 -> 4095     : 124      |****************                      |
    4096 -> 8191     : 291      |**************************************|
    8192 -> 16383    : 36       |****                                  |
   16384 -> 32767    : 16       |**                                    |
   32768 -> 65535    : 8        |*                                     |
   65536 -> 131071   : 0        |                                      |
  131072 -> 262143   : 0        |                                      |
  262144 -> 524287   : 0        |                                      |
  524288 -> 1048575  : 0        |                                      |
 1048576 -> 2097151  : 0        |                                      |
 2097152 -> 4194303  : 1        |                                      |
Detaching...

So we need to add the avg = xxx usecs, total: xxx usecs, count: xxx line. Count is the sum of the count column. For total I'd take the midpoint of the buckets (4096 -> 8191 == 6144) and multiply it by the count, and sum them all. Then the avg is the total divided by the known count.

It'll take a few minutes to do each example. Some awk can help.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song yonghong-song merged commit af90bd4 into iovisor:master Sep 28, 2020
@NPC51129 NPC51129 deleted the funclatency-avg branch October 17, 2020 07:40
ismhong pushed a commit to ismhong/bcc that referenced this pull request Dec 6, 2021
* add average function latency
* update example.txt

Change-Id: I04e423e2b888e5770e639d856572681dc94e8862
ismhong pushed a commit to ismhong/bcc that referenced this pull request Dec 6, 2021
* add average function latency
* update example.txt

Change-Id: I960e77ac3189600cebaa86f8ef4b94f992c76ee8
CrackerCat pushed a commit to CrackerCat/bcc that referenced this pull request Jul 31, 2024
* add average function latency
* update example.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants