-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] break up the learner group tests into shorter tests #35926
[RLlib] break up the learner group tests into shorter tests #35926
Conversation
break them up into 3 classes because they all take a very long time to run causing timeouts on master Signed-off-by: Avnish <[email protected]>
…kup_learner_group_tests
Signed-off-by: Avnish <[email protected]>
addressed comments |
…earner Signed-off-by: Avnish <[email protected]>
…kup_learner_group_tests
Signed-off-by: Avnish <[email protected]>
Signed-off-by: Avnish <[email protected]>
Signed-off-by: Avnish <[email protected]>
@@ -128,7 +128,7 @@ def test_learner_group_local(self): | |||
|
|||
def test_update_multigpu(self): | |||
fws = ["torch", "tf2"] | |||
scaling_modes = REMOTE_SCALING_CONFIGS.keys() | |||
scaling_modes = ["multi-gpu-ddp", "remote-gpu"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avnishn Can you add explanations on why these changes are made? REMOTE_SCALING_CONFIGS.keys()
--> ["multi-gpu-ddp", "remote-gpu"]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reduce the number of tests that we need to run while getting full test coverage by doing this.
.buildkite/pipeline.gpu_large.yml
Outdated
@@ -27,7 +27,7 @@ | |||
# --jobs 2 is necessary as we only need to have at least 2 gpus on the machine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment still says --jobs 2 is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep I was exploring here. by removing jobs=2
I'm trying to understand where my cuda oom comes from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this backl
Signed-off-by: Avnish <[email protected]>
…ect#35926) Signed-off-by: Avnish <[email protected]>
…ect#35926) Signed-off-by: Avnish <[email protected]> Signed-off-by: e428265 <[email protected]>
break them up into 3 classes because they all take a very long time to run
causing timeouts on master
Signed-off-by: Avnish [email protected]
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.