add average function latency #3061

NPC51129 · 2020-08-18T08:53:17Z

the tool funclatency does not provide average latency of function traced.
add avg latency and output like this:

# ./funclatency.py -u do_sys_open
Tracing 1 functions for "do_sys_open"... Hit Ctrl-C to end.
^C     usecs               : count     distribution
         0 -> 1          : 91       |*************************************   |
         2 -> 3          : 32       |*************                           |
         4 -> 7          : 98       |****************************************|
         8 -> 15         : 3        |*                                       |
        16 -> 31         : 1        |                                        |
avg = 3 usecs, total: 773 usecs, count: 225

yonghong-song · 2020-08-19T22:50:34Z

[buildbot, test this please]

brendangregg · 2020-08-19T22:53:54Z

Can you please run a stress test and estimate the overhead of this addition?

If the overhead is at all measurable, I'd rather see this functionality moved to a "-v" for verbose option. If it's negligible, then perhaps we can keep it as the default.

FWIW, a lot of tools are designed to give basic stats to both minimize overhead and visual clutter.

netedwardwu · 2020-08-20T03:16:39Z

The total value is really useful for real-world problems.
I added this total value and used it a lot in my performance tuning.
Sometimes log2histogram is not enough for very detailed profiling (2~5% performance drop).
Without the total value, we will suspect the problem is not related to the function we profiling.
And waste so much time on other wrong ways.

With the total value, we can figure out a 2~5% performance drop is because of a very detailed difference.

brendangregg · 2020-08-20T06:31:31Z

I understand its value, but not the cost. In this and other tools I deliberately leave out extra stuff in the default output so that end users can run the tool with minimum overhead in production, and then choose to have extended output with more cost via command line options. funccount(8) in particular can be aimed at very frequent functions, executed millions of times per second.

But instead of a "-v" for verbose mode, you're adding it to the default output, so we don't have a choice and pay the cost always. That might be ok if the cost was negligible. But you're going to have to do performance analysis of the performance tool to show what it is.

brendangregg · 2020-08-20T06:34:40Z

BTW, personally I'd try a few things, like dd /dev/zero to /dev/null with a block size of 1, without the tool running, with the old version running (instrumenting, say, vfs_read()), and with the new version running. dd prints throughput metrics, aiding comparisons.

NPC51129 · 2020-08-21T03:56:16Z

I tested the tool with dd from /dev/zero to /dev/null and bs=1 as suggested by @brendangregg
a) without tool: 895 kB/s
b) with old version: 582 kB/s
c) with new version: 582 kB/s
so the overhead is negligible.
my environment is
kernel version 4.19.91-19.1.al7.x86_64
cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz

NPC51129 · 2020-09-17T03:11:15Z

Hi, I did a few more tests on this. As the new feature introduces 'lock_xadd' in code, the performance of tool might be impacted.
Test 1
a) run old version tool on multiple cores , and dd if=/dev/zero of=/dev/null bs=1 for 10 secs;
b) run new version tool on multi cores, and dd if=/dev/zero of=/dev/null bs=1 for 10 secs.
The results:

No. of threads	dd tp a) [kB/s]	dd tp b) [kB/s]
1	597	593
2	503	503
4	381	375
8	262	256
16	177	175
32	99.8	99.3
64	44.8	44.6

In this test the overhead of the new version versus the old version is negligible. Nevertheless I don't think we would run multiple threads of funclatency in practice?

Test 2
a) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores.
b) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores and old version funclatency.
c) run dd if=/dev/zero of=/dev/null bs=1 on multiple cores and new version funclatency.
The statistics printed by dd command:

No. of threads	dd tp a) [kB/s]	dd tp b) [kB/s]	dd tp c) [kB/s]
1	898	613	610
2	598	407	407
4	595	357	354
8	592	201	195
16	586	48.5	57.6
32	585	13.7	15.7
64	573	3.2	3.1

In this test the overhead of new version vs. old version is negligible as well.
Overall, adding average function latency to this tool introduces negligible overhead.

brendangregg · 2020-09-21T21:19:25Z

Great testing, thanks. So it looks like the overhead is negligible (I was expecting more given how much overhead this costs in the first place: which you can see by comparing the Test 2 (a) and (b) columns).

It is a touch annoying to change the default output of tools (as it breaks documentation; adding options doesn't typically break documentation) and to clutter it a little, but the benefits probably outweigh that, so I'm happy to merge.

brendangregg · 2020-09-23T23:31:21Z

Can you please update the _example.txt file examples? (see what I mean about breaking documentation!)

Perhaps the quickest way (rather than redoing all the examples) is to take each output and then add a fake summary line based on your best guess from the distribution data. Not ideal, but maybe good enough for the example.

E.g.:

# ./funclatency do_sys_open
Tracing do_sys_open... Hit Ctrl-C to end.
^C
     nsecs           : count     distribution
       0 -> 1        : 0        |                                      |
       2 -> 3        : 0        |                                      |
       4 -> 7        : 0        |                                      |
       8 -> 15       : 0        |                                      |
      16 -> 31       : 0        |                                      |
      32 -> 63       : 0        |                                      |
      64 -> 127      : 0        |                                      |
     128 -> 255      : 0        |                                      |
     256 -> 511      : 0        |                                      |
     512 -> 1023     : 0        |                                      |
    1024 -> 2047     : 0        |                                      |
    2048 -> 4095     : 124      |****************                      |
    4096 -> 8191     : 291      |**************************************|
    8192 -> 16383    : 36       |****                                  |
   16384 -> 32767    : 16       |**                                    |
   32768 -> 65535    : 8        |*                                     |
   65536 -> 131071   : 0        |                                      |
  131072 -> 262143   : 0        |                                      |
  262144 -> 524287   : 0        |                                      |
  524288 -> 1048575  : 0        |                                      |
 1048576 -> 2097151  : 0        |                                      |
 2097152 -> 4194303  : 1        |                                      |
Detaching...

So we need to add the avg = xxx usecs, total: xxx usecs, count: xxx line. Count is the sum of the count column. For total I'd take the midpoint of the buckets (4096 -> 8191 == 6144) and multiply it by the count, and sum them all. Then the avg is the total divided by the known count.

It'll take a few minutes to do each example. Some awk can help.

yonghong-song · 2020-09-28T17:12:19Z

[buildbot, test this please]

* add average function latency * update example.txt Change-Id: I04e423e2b888e5770e639d856572681dc94e8862

* add average function latency * update example.txt Change-Id: I960e77ac3189600cebaa86f8ef4b94f992c76ee8

* add average function latency * update example.txt

add average function latency

84cb09f

NPC51129 requested review from brendangregg and goldshtn as code owners August 18, 2020 08:53

update example.txt

6ec14ed

yonghong-song merged commit af90bd4 into iovisor:master Sep 28, 2020

NPC51129 deleted the funclatency-avg branch October 17, 2020 07:40

ismhong pushed a commit to ismhong/bcc that referenced this pull request Dec 6, 2021

add average function latency (iovisor#3061)

005c26e

* add average function latency * update example.txt Change-Id: I04e423e2b888e5770e639d856572681dc94e8862

ismhong pushed a commit to ismhong/bcc that referenced this pull request Dec 6, 2021

add average function latency (iovisor#3061)

c4f5c94

* add average function latency * update example.txt Change-Id: I960e77ac3189600cebaa86f8ef4b94f992c76ee8

CrackerCat pushed a commit to CrackerCat/bcc that referenced this pull request Jul 31, 2024

add average function latency (iovisor#3061)

58f2241

* add average function latency * update example.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add average function latency #3061

add average function latency #3061

NPC51129 commented Aug 18, 2020

yonghong-song commented Aug 19, 2020

brendangregg commented Aug 19, 2020

netedwardwu commented Aug 20, 2020

brendangregg commented Aug 20, 2020

brendangregg commented Aug 20, 2020

NPC51129 commented Aug 21, 2020

NPC51129 commented Sep 17, 2020

brendangregg commented Sep 21, 2020

brendangregg commented Sep 23, 2020

yonghong-song commented Sep 28, 2020

add average function latency #3061

add average function latency #3061

Conversation

NPC51129 commented Aug 18, 2020

yonghong-song commented Aug 19, 2020

brendangregg commented Aug 19, 2020

netedwardwu commented Aug 20, 2020

brendangregg commented Aug 20, 2020

brendangregg commented Aug 20, 2020

NPC51129 commented Aug 21, 2020

NPC51129 commented Sep 17, 2020

brendangregg commented Sep 21, 2020

brendangregg commented Sep 23, 2020

yonghong-song commented Sep 28, 2020