Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tool: compactsnoop #2582

Merged
merged 1 commit into from
Dec 12, 2019
Merged

add tool: compactsnoop #2582

merged 1 commit into from
Dec 12, 2019

Conversation

ethercflow
Copy link
Contributor

@ethercflow ethercflow commented Nov 1, 2019

compactsnoop traces the compact zone system-wide, and print various details.
Example output:

  # ./compactsnoop
 COMM           PID    NODE ZONE         ORDER MODE       LAT(ms)           STATUS
 zsh            23685  0    ZONE_DMA     -1    SYNC      0.025444         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      3.925823         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC    113.975603         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC      81.57143         complete
 zsh            23685  0    ZONE_DMA     -1    SYNC       0.02774         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      4.631926         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC    113.975706         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC     80.647534         complete
 zsh            23685  0    ZONE_DMA     -1    SYNC      0.020838         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      3.367714         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC     115.18123         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC     81.766559         complete
 zsh            23685  0    ZONE_DMA     -1    SYNC      0.025587         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      4.346871         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC    114.570453         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC     80.820001         complete
 zsh            23685  0    ZONE_DMA     -1    SYNC      0.026747         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      4.611412         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC    113.993654         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC     80.928759         complete
 zsh            23685  0    ZONE_DMA     -1    SYNC       0.02789         complete
 zsh            23685  0    ZONE_DMA32   -1    SYNC      3.889023         complete
 zsh            23685  0    ZONE_NORMAL  -1    SYNC    113.776424         complete
 zsh            23685  1    ZONE_NORMAL  -1    SYNC     80.727358         complete
 ^C

While tracing, the processes alloc pages due to memory fragmentation is too
serious to meet contiguous memory requirements in the system, compact zone
events happened, which will increase the waiting delay of the processes.

compactsnoop can be useful for discovering when compact_stall(/proc/vmstat)
continues to increase, whether it is caused by some critical processes or not.

When enable extern fields, it also shows node's zone's order's frag index and zone's watermark:

The -e option prints out extra columns

  # ./compactsnoop -e
 COMM           PID    NODE ZONE         ORDER MODE    FRAGIDX  MIN      LOW      HIGH     FREE        LAT(ms)           STATUS
 summ           28276  1    ZONE_NORMAL  3     ASYNC   0.728    11284    14105    16926    14193       3.58428          partial
 summ           28276  0    ZONE_NORMAL  2     ASYNC   -1.000   11043    13803    16564    14479        0.0263         complete
 summ           28276  1    ZONE_NORMAL  2     ASYNC   -1.000   11284    14105    16926    14785      0.019004         complete
 summ           28276  0    ZONE_NORMAL  2     ASYNC   -1.000   11043    13803    16564    15199      0.006467          partial
 summ           28276  1    ZONE_NORMAL  2     ASYNC   -1.000   11284    14105    16926    17360      0.030806         complete
 summ           28276  0    ZONE_NORMAL  2     ASYNC   -1.000   11043    13803    16564    15443      0.024644         complete
 summ           28276  1    ZONE_NORMAL  2     ASYNC   -1.000   11284    14105    16926    15634      0.018882         complete
 summ           28276  1    ZONE_NORMAL  3     ASYNC   0.832    11284    14105    16926    15301      0.006826          partial
 summ           28276  0    ZONE_NORMAL  2     ASYNC   -1.000   11043    13803    16564    14774      0.005991          partial
 summ           28276  1    ZONE_NORMAL  3     ASYNC   0.733    11284    14105    16926    19888      0.012343          partial
 ^C

The -K option prints out kernel stack

# ./compactsnoop -K -e

summ           28276  0    ZONE_NORMAL  3     ASYNC   0.528    11043    13803    16564    22654     13.258874          partial
              kretprobe_trampoline+0x0
              try_to_compact_pages+0x121
              __alloc_pages_direct_compact+0xac
              __alloc_pages_slowpath+0x3e9
              __alloc_pages_nodemask+0x404
              alloc_pages_current+0x98
              new_slab+0x2c5
              ___slab_alloc+0x3ac
              __slab_alloc+0x40
              kmem_cache_alloc_node+0x8b
              copy_process+0x18e
              do_fork+0x91
              sys_clone+0x16
              stub_clone+0x44

summ           28276  1    ZONE_NORMAL  3     ASYNC   -1.000   11284    14105    16926    22074      0.008713          partial
              kretprobe_trampoline+0x0
              try_to_compact_pages+0x121
              __alloc_pages_direct_compact+0xac
              __alloc_pages_slowpath+0x3e9
              __alloc_pages_nodemask+0x404
              alloc_pages_current+0x98
              new_slab+0x2c5
              ___slab_alloc+0x3ac
              __slab_alloc+0x40
              kmem_cache_alloc_node+0x8b
              copy_process+0x18e
              do_fork+0x91
              sys_clone+0x16
              stub_clone+0x44

@ethercflow
Copy link
Contributor Author

@yonghong-song @brendangregg Sorry to bother you, I met compact memory issue which caused high sys cpu and latency in two of our customs, so I develop this tool to diagnose this problem (even though their kernel doesn't support bpf, I diagnosed with sar -B + /proc/vmstat's compact_* + on-cpu frame graph) and hope this tool can help others. So would you please take a look at this when you have time? And I don't know how to alleviate memory fragmentation problems when it already caused lots of compation. Would you please give me some suggestions? Thanks a lot.

@yonghong-song
Copy link
Collaborator

Hi, @ethercflow will take a detailed look in the next couple of days.

@yonghong-song
Copy link
Collaborator

[buildbot, test this please]

@yonghong-song
Copy link
Collaborator

[buildbot, ok to test]

@yonghong-song
Copy link
Collaborator

@ethercflow For your below note

Since current libbpf doesn't support bpf_get_stackid_raw_tp yet, so
when enable -K to show kernel stack, this tool will use kprobe instead of
raw_tracepoint;

bpf_get_stackid_raw_tp is a kernel internal function. In bpf programs, the
same helper get_stackid can be used to get stack id and bcc does
support it. There is no difference to get stack id compared to other
tracing programs.

Could you try to use raw tracepoint implementation only? Looks like
it should provide enough and even robust functionality?

@ethercflow
Copy link
Contributor Author

@ethercflow For your below note

Since current libbpf doesn't support bpf_get_stackid_raw_tp yet, so
when enable -K to show kernel stack, this tool will use kprobe instead of
raw_tracepoint;

bpf_get_stackid_raw_tp is a kernel internal function. In bpf programs, the
same helper get_stackid can be used to get stack id and bcc does
support it. There is no difference to get stack id compared to other
tracing programs.

Thank you @yonghong-song, I'll use it with raw_tp.

Could you try to use raw tracepoint implementation only? Looks like
it should provide enough and even robust functionality?

Sure, I'll remove kprobe version.

@brendangregg
Copy link
Member

Looks like a good tool. Can you please make the default output 80 chars wide (fits better in books, articles, tickets, etc)? It's currently 81, so you need to trim a column by 1 character; I'd trim the LAT(ms) column, and reduce the number of decimal places to 2 or 3 (it dosen't need more).

Does the example output make sense? Was zsh really taking 113 millisecond hits due to zone compaction? Just sanity checking.

@ethercflow
Copy link
Contributor Author

ethercflow commented Nov 12, 2019

Looks like a good tool. Can you please make the default output 80 chars wide (fits better in books, articles, tickets, etc)? It's currently 81, so you need to trim a column by 1 character; I'd trim the LAT(ms) column, and reduce the number of decimal places to 2 or 3 (it dosen't need more).

Thank you @brendangregg , I'll remove kprobe version and keep output 80 chars wide (sorry for this output wide thing).

Does the example output make sense? Was zsh really taking 113 millisecond hits due to zone compaction? Just sanity checking.

Yes, I tested this on ubuntu 18.04 with its latest 5.0 kernel. (5.0.0-34-generic)
I manual trigger by echo 1 > /proc/sys/vm/compact_memory.

I verified with funclatency.py get:

 ➜  tools git:(master) ✗ ./fu./funclatency.py -m compact_zone
 Tracing 1 functions for "compact_zone"... Hit Ctrl-C to end.
 ^C
      msecs               : count     distribution
          0 -> 1          : 8        |********************                    |
          2 -> 3          : 5        |************                            |
          4 -> 7          : 3        |*******                                 |
          8 -> 15         : 0        |                                        |
         16 -> 31         : 0        |                                        |
         32 -> 63         : 0        |                                        |
         64 -> 127        : 16       |****************************************|
 Detaching...

It seems same as compactsnoop tool output, and unbelievable, most of the delay falls on 64-127ms.

Below is fragment info

➜  ~ cat /proc/buddyinfo
Node 0, zone      DMA      0      1      0      0      2      1      1      0      1      1      3
Node 0, zone    DMA32      6      1      3      6      4      5      3      4      3      1    416
Node 0, zone   Normal   1348    491    359    179     42     13      2      1      0      1  20618
Node 1, zone   Normal   1582   1187    469    131    137     51     21     16     18     21  19229
➜  ~ cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      0      1      0      0      2      1      1      0      1      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0      0      0      0      0      0      1      3
Node    0, zone      DMA, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone      DMA, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable      2      0      1      1      2      2      2      3      1      0      0
Node    0, zone    DMA32, type      Movable      4      1      2      5      2      3      1      1      2      1    416
Node    0, zone    DMA32, type  Reclaimable      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable    121    328     85     34      1      1      2      0      0      0      1
Node    0, zone   Normal, type      Movable   1340    793    399     89     30     11      0      1      1      0  20614
Node    0, zone   Normal, type  Reclaimable      1      0      1      0      6      3      0      1      0      0      0
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate
Node 0, zone      DMA            1            7            0            0            0            0
Node 0, zone    DMA32            6         1010            0            0            0            0
Node 0, zone   Normal          364        47398          366            0            0            0
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
Node    1, zone   Normal, type    Unmovable    120    272    171     42     64     31     12      2      3      2      0
Node    1, zone   Normal, type      Movable   1416    978    359     78     36     14     10      7      8     13  19235
Node    1, zone   Normal, type  Reclaimable      1      1      3      0      1      0      0      1      1      0      0
Node    1, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    1, zone   Normal, type          CMA      0      0      0      0      0      0      0      0      0      0      0
Node    1, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable   HighAtomic          CMA      Isolate
Node 1, zone   Normal          236        48496          420            0            0            0

@brendangregg
Copy link
Member

Ok, thanks; I'm happy to include the tool (once it is updated).

@ethercflow
Copy link
Contributor Author

@yonghong-song @brendangregg update, PTAL.

tools/compactsnoop.py Outdated Show resolved Hide resolved
tools/compactsnoop.py Outdated Show resolved Hide resolved
man/man8/compactsnoop.8 Show resolved Hide resolved
tools/compactsnoop.py Outdated Show resolved Hide resolved
tools/compactsnoop.py Show resolved Hide resolved
tools/compactsnoop.py Show resolved Hide resolved
tools/old/compactsnoop.py Outdated Show resolved Hide resolved
@yonghong-song
Copy link
Collaborator

maybe you also want to add an entry in snapcraft/snapcraft.yaml file.

@yonghong-song
Copy link
Collaborator

The change looks good although probably due to that fact there is no much compaction going on my host, I only got stack like this,

^C^C-bash-4.4$ sudo ./compactsnoop.py -e -K
COMM           PID    NODE ZONE         ORDER MODE    FRAGIDX  MIN      LOW      HIGH     FREE       LAT(ms)           STATUS
bash           2304041 0    ZONE_DMA     0     SYNC    -1.000   0        0        0        0            0.023         complete
        bpf_prog_02b366a5a4e30e65_raw_tracepoint__mm_compaction_end+0x126
        bpf_prog_02b366a5a4e30e65_raw_tracepoint__mm_compaction_end+0x126
        bpf_trace_run6+0x4a
        compact_zone+0xa9d
        compact_node+0xa9
        sysctl_compaction_handler+0x40
        proc_sys_call_handler+0x19b
        vfs_write+0xa5
        ksys_write+0x59
        do_syscall_64+0x42
        entry_SYSCALL_64_after_hwframe+0x44
...

@brendangregg any further comments on the tool?

@ethercflow
Copy link
Contributor Author

Hi, @brendangregg any further comments on the tool?

@yonghong-song
Copy link
Collaborator

@brendangregg Earlier you mentioned this is a good tool. Could you check again whether it has addressed all the issues and is good enough to be included in bcc/tools directory?

@brendangregg
Copy link
Member

Looks good, thanks!

@brendangregg brendangregg merged commit 6e9b450 into iovisor:master Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants