-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add machine_util.py for tuning optimal benchmark settings #183
Conversation
Applies some configuration settings automatically, but verifies more. Future work - utilize the query part from inside benchmark to embed queried values in json results - additional optimizations like nohz, other nvidia settings, disabling IRQ - add temp/clock query for gpu; try querying nvidia-smi performance mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with small comments.
scripts/machine_util.py
Outdated
isolcpus.add(int(chunk)) | ||
return list(isolcpus) | ||
|
||
def get_nvidia_graphics_clock(device_id=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to get all available graphics clock by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, i have already made that change actually. I am adding more stuff still. I'll update the PR in a little bit.
- debugging issue where GOMP_CPU_AFFINITY and taskset are properly specified with a range e.g. 4-47 on the command line for pytest, but the machine util sees process affinity as just 4; this does not happen when using the same command line with python instead of pytest and invoking the machine util - need to verify the extra json file size incurred by logging machine state every benchmark
cedeedb
to
3c2c128
Compare
- be more explicit about checking the machine os config and bail gracefully outside amazon linux - restore limited isolated cpu checks modulo pytorch issue #49971 - implement nvidia clock setting
- add 'ignore_machine_config' option to enable running without checks - bail out before running tests rather than after - log machine info in the json
since this CI runner is not one of the performance tuned machines, just used for correctness.
f.write(content) | ||
|
||
def check_intel_turbo_state(turbo_file='/sys/devices/system/cpu/intel_pstate/no_turbo'): | ||
return int(read_sys_file(turbo_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be 1 - int(read_sys_file(turbo_file))
because if "no_turbo" is 1 means turbo is disabled, this function should return 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yea good point.
torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script.
Applies some configuration settings automatically, but verifies more. Only works on amazon linux so far; tries to bail gracefully on other platforms. Usage: - as a command line script, machine_config.py will check or configure the machine sudo `which python` <path to machine_config.py> --configure - as a library, provides functions for use e.g. in conftest.py - asserts benchmark script is run with configured settings - logs machine settings to benchmark data file Other miscellaneous fixes: * Add score plot, nightly sweep scripts * Add legend to sweep result plotting script. (pytorch#193) Moves compute_score and some other utils around. Needs more work. torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script. Co-authored-by: xz <[email protected]>
* Add machine_util.py for tuning optimal benchmark settings Cherry-picked from master branch PR #183 Applies some configuration settings automatically, but verifies more. Only works on amazon linux so far; tries to bail gracefully on other platforms. Usage: - as a command line script, machine_config.py will check or configure the machine sudo `which python` <path to machine_config.py> --configure - as a library, provides functions for use e.g. in conftest.py - asserts benchmark script is run with configured settings - logs machine settings to benchmark data file Other miscellaneous fixes: * Add score plot, nightly sweep scripts * Add legend to sweep result plotting script. (#193) Moves compute_score and some other utils around. Needs more work. torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script. Co-authored-by: xz <[email protected]> * Fix CI script paths under the new directory hierarchy. Co-authored-by: Will Constable <[email protected]>
Applies some configuration settings automatically, but verifies more.
Future work