-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] Llama 2 workspace template release tests #37871
[Train] Llama 2 workspace template release tests #37871
Conversation
…project#37745) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
This reverts commit 05b0a0f.
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
3cf79d1
to
a7192bb
Compare
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
release test passing: https://buildkite.com/ray-project/release-tests-pr/builds/46858#_ |
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
final aws build: https://buildkite.com/ray-project/release-tests-pr/builds/46878 |
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
@justinvyu @can-anyscale This needs your approvals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(leaving this to @can-anyscale )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# See https://hub.docker.com/r/anyscale/ray for full list of | ||
# available Ray, Python, and CUDA versions. | ||
base_image: "anyscale/ray:2.6.1-py39-cu117" | ||
base_image: anyscale/ray:nightly-py39-cu117 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cu118? for consistency with byod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
- ResourceType: "instance" | ||
Tags: | ||
- Key: ttl-hours | ||
Value: '24' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line
- ResourceType: "instance" | ||
Tags: | ||
- Key: ttl-hours | ||
Value: '24' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: new line
@@ -16,7 +16,9 @@ | |||
) | |||
|
|||
aws_gpu_cpu_to_concurrency_groups = [ | |||
Condition(min_gpu=9, max_gpu=-1, min_cpu=0, max_cpu=-1, group="large-gpu", limit=4), | |||
Condition( | |||
min_gpu=9, max_gpu=-1, min_cpu=0, max_cpu=-1, group="large-gpu", limit=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
release/release_tests.yaml
Outdated
working_dir: workspace_templates/04_finetuning_llms_with_deepspeed | ||
python: "3.9" | ||
frequency: nightly-3x | ||
team: train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ml?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
bitsandbytes | ||
wandb | ||
pytorch-lightning | ||
protobuf<3.21.0 | ||
torchmetrics | ||
lm_eval | ||
tiktoken | ||
sentencepiece |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@can-anyscale So, all release tests are running with the same docker image now? If I want to add a package to one release test, I need to add it to the common byod requirement file? Is there a doc where I can read about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Tho on a PR it would still use your specified cluster_env but on the builds from master it is currently using the byod. @can-anyscale will deprecate the use of cluster envs soon for both PRs and master builds.
BYOD has a couple of benefits:
- You don't have to build the envs when launching release tests which means faster time to failure if any
- It will make tests more reliable due to consistency of versions. The best case scenario is if we don't pin anything in in the byod requirements files.
…7871) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: NripeshN <[email protected]>
…7871) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: harborn <[email protected]>
…7871) Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
…7871) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: e428265 <[email protected]>
…7871) Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: Victor <[email protected]>
Why are these changes needed?
This enables release tests for Llama-2 workspace templates
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.