cpudist stop working when there are too many fork #2567

TristanCacqueray · 2019-10-23T15:03:24Z

Running cpudist -P 1 when a service forks a lot (such as the update-mandb cron) results in incorrect results where only a handfull of process' distributions are taken into account after the forks dies.

This can be reproduced by running this command: for i in $(seq 10000); do echo fork | dd of=/dev/null & done. After the command completes, cpudist prints only a few distribution and miss most the load happening after.

It seems like it is related to the default BPF_HASH size and it can be fixed by the proposed PR below, but perhaps there is a better way to avoid that issue?

The text was updated successfully, but these errors were encountered:

This change fixes the cpudist tool to avoid issue when too many tasks are running. Fixes iovisor#2567 -- cpudist stop working when there are too many fork

yonghong-song · 2019-10-28T21:18:27Z

Do you mean cpudist -P or cpudist -p 1? Do you have more details with numbers to show what is the problem? BPF_HASH adjustment is a common mechanism if you have large volume of data. But maybe you can describe your problem with examples first?

TristanCacqueray · 2019-10-28T22:49:15Z

I meant cpudist -P 1, that is per pid with a 1 second interval print. I don't know exactly when the issue starts to happen but here are some numbers using that script:

#!/bin/sh -e
PYTHONUNBUFFERED=1 python3 ./cpudist.py -P 1 > log &
sleep 1
for i in $(seq 10000); do echo fork | dd of=/dev/null &> /dev/null & done
sleep 5
echo "Log count: $(wc -l log)"
kill -9 $(jobs -p)

With the PR #2568 applied, the log file contains around 100k lines. Without the PR the log file only contains about 5k lines, thus it seems like there is something wrong with the default hash size and perhaps how it handles collision?

yonghong-song · 2019-10-29T00:13:12Z

The program uses map update with replacement enabled. if there is a collision in the hash table, the new one just overwrite the old one.

If this case, for your use case, I guess it make sense to add a command line option like --hash-storage-size to have a user input for storage for hash table start. There is no need to increase BPF_HISTOGRAM which is at log2 scale and the default value is typically good enough.

This change fixes the cpudist tool to avoid issue when too many tasks are running. Fixes iovisor#2567 -- cpudist stop working when there are too many fork

TristanCacqueray mentioned this issue Oct 23, 2019

cpudist: create sufficiently large hash table to avoid missing tasks #2568

Merged

brendangregg closed this as completed in #2568 Nov 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cpudist stop working when there are too many fork #2567

cpudist stop working when there are too many fork #2567

TristanCacqueray commented Oct 23, 2019

yonghong-song commented Oct 28, 2019

TristanCacqueray commented Oct 28, 2019

yonghong-song commented Oct 29, 2019

cpudist stop working when there are too many fork #2567

cpudist stop working when there are too many fork #2567

Comments

TristanCacqueray commented Oct 23, 2019

yonghong-song commented Oct 28, 2019

TristanCacqueray commented Oct 28, 2019

yonghong-song commented Oct 29, 2019