- Fine-Tune Llama 2 Models with Ray and DeepSpeed on OpenShift AI
-
Admin access to an OpenShift cluster (CRC is fine)
-
Installed OpenDataHub or RHOAI, enabled all Distributed Workload components
-
Installed Go 1.21
CODEFLARE_TEST_OUTPUT_DIR
- Output directory for test logsCODEFLARE_TEST_TIMEOUT_SHORT
- Timeout duration for short tasksCODEFLARE_TEST_TIMEOUT_MEDIUM
- Timeout duration for medium tasksCODEFLARE_TEST_TIMEOUT_LONG
- Timeout duration for long tasks
FMS_HF_TUNING_IMAGE
- Image tag used in PyTorchJob CR for model training
ODH_NAMESPACE
- Namespace where ODH components are installed toNOTEBOOK_USER_NAME
- Username of user used for running WorkbenchNOTEBOOK_USER_TOKEN
- Login token of user used for running WorkbenchNOTEBOOK_IMAGE
- Image used for running Workbench
To download MNIST training script datasets from S3 compatible storage, use the environment variables mentioned below :
AWS_DEFAULT_ENDPOINT
- Storage bucket endpoint from which to download MNIST datasetsAWS_ACCESS_KEY_ID
- Storage bucket access keyAWS_SECRET_ACCESS_KEY
- Storage bucket secret keyAWS_STORAGE_BUCKET
- Storage bucket nameAWS_STORAGE_BUCKET_MNIST_DIR
- Storage bucket directory from which to download MNIST datasets.
Note : Either use
Execute tests like standard Go unit tests.
go test -timeout 60m ./tests/kfto/