Add machine_util.py for tuning optimal benchmark settings #183

wconstab · 2020-12-29T06:11:31Z

Applies some configuration settings automatically, but verifies more.

Future work

utilize the query part from inside benchmark to embed queried values in json results
additional optimizations like nohz, other nvidia settings, disabling IRQ
add temp/clock query for gpu; try querying nvidia-smi performance mode

Applies some configuration settings automatically, but verifies more. Future work - utilize the query part from inside benchmark to embed queried values in json results - additional optimizations like nohz, other nvidia settings, disabling IRQ - add temp/clock query for gpu; try querying nvidia-smi performance mode

xuzhao9

LGTM with small comments.

xuzhao9 · 2020-12-29T18:23:57Z

scripts/machine_util.py

+ isolcpus.add(int(chunk))
+ return list(isolcpus)
+
+def get_nvidia_graphics_clock(device_id=0):


Does it make sense to get all available graphics clock by default?

yep, i have already made that change actually. I am adding more stuff still. I'll update the PR in a little bit.

- debugging issue where GOMP_CPU_AFFINITY and taskset are properly specified with a range e.g. 4-47 on the command line for pytest, but the machine util sees process affinity as just 4; this does not happen when using the same command line with python instead of pytest and invoking the machine util - need to verify the extra json file size incurred by logging machine state every benchmark

- be more explicit about checking the machine os config and bail gracefully outside amazon linux - restore limited isolated cpu checks modulo pytorch issue #49971 - implement nvidia clock setting

- add 'ignore_machine_config' option to enable running without checks - bail out before running tests rather than after - log machine info in the json

pytorch/pytorch#49375

since this CI runner is not one of the performance tuned machines, just used for correctness.

xuzhao9 · 2021-01-26T21:54:32Z

torchbenchmark/util/machine_config.py

+ f.write(content)
+
+def check_intel_turbo_state(turbo_file='/sys/devices/system/cpu/intel_pstate/no_turbo'):
+ return int(read_sys_file(turbo_file))


I think it should be 1 - int(read_sys_file(turbo_file)) because if "no_turbo" is 1 means turbo is disabled, this function should return 0.

oh yea good point.

torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script.

Applies some configuration settings automatically, but verifies more. Only works on amazon linux so far; tries to bail gracefully on other platforms. Usage: - as a command line script, machine_config.py will check or configure the machine sudo `which python` <path to machine_config.py> --configure - as a library, provides functions for use e.g. in conftest.py - asserts benchmark script is run with configured settings - logs machine settings to benchmark data file Other miscellaneous fixes: * Add score plot, nightly sweep scripts * Add legend to sweep result plotting script. (pytorch#193) Moves compute_score and some other utils around. Needs more work. torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script. Co-authored-by: xz <[email protected]>

* Add machine_util.py for tuning optimal benchmark settings Cherry-picked from master branch PR #183 Applies some configuration settings automatically, but verifies more. Only works on amazon linux so far; tries to bail gracefully on other platforms. Usage: - as a command line script, machine_config.py will check or configure the machine sudo `which python` <path to machine_config.py> --configure - as a library, provides functions for use e.g. in conftest.py - asserts benchmark script is run with configured settings - logs machine settings to benchmark data file Other miscellaneous fixes: * Add score plot, nightly sweep scripts * Add legend to sweep result plotting script. (#193) Moves compute_score and some other utils around. Needs more work. torchbenchmark isn't installed in a proper way, so it doesn't work to use relative imports from a script with a main function that is nested inside the torchbenchmark package. work around this for now. make machine_config script agnostic to cpu core pinning when run as a script, since pinning is checked by conftest during benchmarking and it's pointless to make the user pin the configuration script. Co-authored-by: xz <[email protected]> * Fix CI script paths under the new directory hierarchy. Co-authored-by: Will Constable <[email protected]>

wconstab requested a review from xuzhao9 December 29, 2020 06:11

wconstab self-assigned this Dec 29, 2020

facebook-github-bot added the cla signed label Dec 29, 2020

xuzhao9 approved these changes Dec 29, 2020

View reviewed changes

wconstab added 2 commits December 29, 2020 22:50

Add score plot, nightly sweep scripts

3c2c128

wconstab force-pushed the wconstab/add_machine_util branch from cedeedb to 3c2c128 Compare January 4, 2021 22:19

wconstab and others added 8 commits January 11, 2021 18:05

minor fixes

3024d9b

Add legend to sweep result plotting script. (#193)

f90ae4c

update machine_config.py to rely on pstate, amazon linux

6d1d81f

- be more explicit about checking the machine os config and bail gracefully outside amazon linux - restore limited isolated cpu checks modulo pytorch issue #49971 - implement nvidia clock setting

add machine checks to pytest setup

b65aa27

- add 'ignore_machine_config' option to enable running without checks - bail out before running tests rather than after - log machine info in the json

use pip instead of conda nightlies to work around conda issue

4049119

pytorch/pytorch#49375

ignore_machine_config in benchmark CI

27c5f94

since this CI runner is not one of the performance tuned machines, just used for correctness.

Merge branch 'master' into wconstab/add_machine_util

534996d

update location of score

280e634

xuzhao9 reviewed Jan 26, 2021

View reviewed changes

wconstab and others added 3 commits January 26, 2021 14:30

fix no_turbo logic

90199cf

Add pstate CPU frequency check and setup. (#198)

40fbdd4

wconstab merged commit f9e5a75 into master Jan 27, 2021

wconstab deleted the wconstab/add_machine_util branch January 27, 2021 03:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add machine_util.py for tuning optimal benchmark settings #183

Add machine_util.py for tuning optimal benchmark settings #183

wconstab commented Dec 29, 2020

xuzhao9 left a comment

xuzhao9 Dec 29, 2020

wconstab Dec 29, 2020

xuzhao9 Jan 26, 2021

wconstab Jan 26, 2021

Add machine_util.py for tuning optimal benchmark settings #183

Add machine_util.py for tuning optimal benchmark settings #183

Conversation

wconstab commented Dec 29, 2020

xuzhao9 left a comment

Choose a reason for hiding this comment

xuzhao9 Dec 29, 2020

Choose a reason for hiding this comment

wconstab Dec 29, 2020

Choose a reason for hiding this comment

xuzhao9 Jan 26, 2021

Choose a reason for hiding this comment

wconstab Jan 26, 2021

Choose a reason for hiding this comment