Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpudist stop working when there are too many fork #2567

Closed
TristanCacqueray opened this issue Oct 23, 2019 · 3 comments · Fixed by #2568
Closed

cpudist stop working when there are too many fork #2567

TristanCacqueray opened this issue Oct 23, 2019 · 3 comments · Fixed by #2568

Comments

@TristanCacqueray
Copy link
Contributor

Running cpudist -P 1 when a service forks a lot (such as the update-mandb cron) results in incorrect results where only a handfull of process' distributions are taken into account after the forks dies.

This can be reproduced by running this command: for i in $(seq 10000); do echo fork | dd of=/dev/null & done. After the command completes, cpudist prints only a few distribution and miss most the load happening after.

It seems like it is related to the default BPF_HASH size and it can be fixed by the proposed PR below, but perhaps there is a better way to avoid that issue?

TristanCacqueray added a commit to TristanCacqueray/bcc that referenced this issue Oct 23, 2019
This change fixes the cpudist tool to avoid issue when too many tasks
are running.

Fixes iovisor#2567 -- cpudist stop working when there are too many fork
@yonghong-song
Copy link
Collaborator

Do you mean cpudist -P or cpudist -p 1? Do you have more details with numbers to show what is the problem? BPF_HASH adjustment is a common mechanism if you have large volume of data. But maybe you can describe your problem with examples first?

@TristanCacqueray
Copy link
Contributor Author

I meant cpudist -P 1, that is per pid with a 1 second interval print. I don't know exactly when the issue starts to happen but here are some numbers using that script:

#!/bin/sh -e
PYTHONUNBUFFERED=1 python3 ./cpudist.py -P 1 > log &
sleep 1
for i in $(seq 10000); do echo fork | dd of=/dev/null &> /dev/null & done
sleep 5
echo "Log count: $(wc -l log)"
kill -9 $(jobs -p)

With the PR #2568 applied, the log file contains around 100k lines. Without the PR the log file only contains about 5k lines, thus it seems like there is something wrong with the default hash size and perhaps how it handles collision?

@yonghong-song
Copy link
Collaborator

The program uses map update with replacement enabled. if there is a collision in the hash table, the new one just overwrite the old one.

If this case, for your use case, I guess it make sense to add a command line option like --hash-storage-size to have a user input for storage for hash table start. There is no need to increase BPF_HISTOGRAM which is at log2 scale and the default value is typically good enough.

CrackerCat pushed a commit to CrackerCat/bcc that referenced this issue Jul 31, 2024
This change fixes the cpudist tool to avoid issue when too many tasks
are running.

Fixes iovisor#2567 -- cpudist stop working when there are too many fork
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants