-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] Simplify llama 2 workspace template #38444
[Train] Simplify llama 2 workspace template #38444
Conversation
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
if pargs.cluster_env_build_id: | ||
cluster_env_config_kwargs.update(build_id=pargs.cluster_env_build_id) | ||
|
||
base_cmd = f"chmod +x ./run_llama_ft.sh && ./run_llama_ft.sh --size={pargs.size}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you know where in the submission chain the executable permissions get dropped -- in the repo they are there so the chmod +x
shouldn't be needed :)
If we know where it gets dropped, it might be worth fixing that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I don't know. I tried it without chmod first and it errored. Any clue where this might be coming from ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably not the right person to review but very nice change!
Do you know about the following pattern: Say you have a file
Then you can run |
I wasn't aware of this pattern no. but like u said, the default values make the current solution nice. Could this thing support sth like ${SIZE}|"7b"? |
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
The way I would do it is have in the readme that the user can run
and check an appropriate version of It also teaches them this neat trick. It is also useful outside of Anyscale yamls (envsubst is a shell command). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice clean up!
with FileLock(lock_file): | ||
download_model( | ||
model_id=model_id, bucket_uri=bucket_uri, s3_sync_args=["--no-sign-request"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need some sort of de-dup logic to only download the model if it doesn't exist here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic is implemented by aws cli. It is a sync operation. If it already exists it won't sync. It's smarter than if it exists, it also detects file changes.
@@ -559,9 +562,10 @@ def main(): | |||
} | |||
) | |||
|
|||
train_ds = create_ray_dataset(args.train_path) | |||
# Read data | |||
train_ds = ray.data.read_json(args.train_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
parser.add_argument("--compute-config", type=str, help="Path to the compute config") | ||
parser.add_argument( | ||
"--cluster-env-build-id", | ||
type=str, | ||
help="The build-id of the cluster env to use", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there default values for these? (The README update indicates that there should be)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default is None for this one.
* Remove the need for prepare_node stuff by enabling the downloading as part of the training function * Added a script to create job submission yamls * Simplified the ray dataset creation by directly reading the json file into a ray dataset. Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: harborn <[email protected]>
* Remove the need for prepare_node stuff by enabling the downloading as part of the training function * Added a script to create job submission yamls * Simplified the ray dataset creation by directly reading the json file into a ray dataset. Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
* Remove the need for prepare_node stuff by enabling the downloading as part of the training function * Added a script to create job submission yamls * Simplified the ray dataset creation by directly reading the json file into a ray dataset. Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
* Remove the need for prepare_node stuff by enabling the downloading as part of the training function * Added a script to create job submission yamls * Simplified the ray dataset creation by directly reading the json file into a ray dataset. Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: e428265 <[email protected]>
* Remove the need for prepare_node stuff by enabling the downloading as part of the training function * Added a script to create job submission yamls * Simplified the ray dataset creation by directly reading the json file into a ray dataset. Signed-off-by: Kourosh Hakhamaneshi <[email protected]> Signed-off-by: Victor <[email protected]>
Why are these changes needed?
This PR make the scripts simpler:
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.