[#4] Add VESSL Callback to Post Metrics to VESSL AI #6

rifqiyan · 2024-03-27T06:43:37Z

Changes

Add VesslLogMetricsCallback to push metrics of training to VESSL AI.
Configure Axolotl to automatically enable the callback inside VESSL Run.

Screenshot

https://screen.yanolja.in/sDrqmbTZGqipTMh4.png

seungduk-yanolja · 2024-03-27T07:43:49Z

src/axolotl/utils/vessl_.py

+ # default credential inside a VESSL Run
+ credential_path = os.environ.get("VESSL_RUN_INITIAL_CONFIG")
+ if credential_path:
+ cfg.use_vessl = True


In what case cfg.vessl_credential_path is not None and cfg.use_vessl is False?

Both are None by default if not set to a value. I will check if adding the parameter to training yaml file may cause them to have a value.

if you want to add configs, you need to touch this file: https://github.com/Y-IAB/axolotl/blob/main/src/axolotl/utils/config/models/input/v0_4_1/__init__.py

do we need both values? if cfg.vessl_credential_path is set, can't we consider it that vessl is enabled?

No, I don't want to add config. I just want it to be enabled automatically inside a VESSL Run.

True, I will update it.

but you are using cfg here. for type checking, I think you need to add one

seungduk-yanolja · 2024-03-27T07:47:42Z

src/axolotl/core/trainer_builder.py

@@ -836,6 +836,13 @@ def get_callbacks(self) -> List[TrainerCallback]:
 SaveAxolotlConfigtoWandBCallback(self.cfg.axolotl_config_path)
 )

+ if self.cfg.use_vessl:
+ from axolotl.utils.callbacks.vessl_ import VesslLogMetricsCallback


could you tell me why you added a suffix _ to vessl_?

They added underscore too on mlflow (mlflow_) and wandb (wandb_) package, so I think it's the convention for external integration in this repo.

seungduk-yanolja · 2024-03-27T07:49:10Z

src/axolotl/cli/__init__.py

@@ -369,6 +370,8 @@ def load_cfg(config: Union[str, Path] = Path("examples/"), **kwargs):

 setup_mlflow_env_vars(cfg)

+ setup_vessl_env_vars(cfg)


please note that usually, it is better you check the condition from where you call the function when the function does nothing when the condition is not met.
LGTM for now to keep it consistent since line 371 does the same as yours.

…nt-callback

rifqiyan · 2024-03-29T02:59:51Z

.pylintrc

@@ -9,6 +9,6 @@ generated-members=numpy.*, torch.*


 [pylint.messages_control]
-disable=missing-function-docstring, line-too-long, import-error,
+disable=missing-function-docstring, line-too-long, import-error, too-many-lines,


I added too-many-lines because data.py is already almost hitting the 1000 lines limit, and with puree dataset logic it exceeds the limit.

seungduk-yanolja · 2024-03-29T20:14:21Z

src/axolotl/utils/vessl_.py

+def setup_vessl_env_vars(cfg: DictDefault):
+ # VESSL_RUN_INITIAL_CONFIG is a variable that contain path to
+ # default credential inside a VESSL Run
+ credential_path = os.environ.get("VESSL_RUN_INITIAL_CONFIG")


should not override cfg.vessl_credential_path if it is already set

if cfg.vessl_credential_path: return credential_path = os.environ.get("VESSL_RUN_INITIAL_CONFIG") if credential_path: cfg.vessl_credential_path = credential_path

seungduk-yanolja · 2024-03-29T20:28:53Z

src/axolotl/utils/callbacks/vessl_.py

+ logs: Dict[str, float],
+ **kwargs # pylint: disable=unused-argument
+ ):
+ if state.is_world_process_zero:


can you explain where you copied this code from, please?
I am wondering why it is different to the following
https://github.com/vessl-ai/examples/blob/ebeae1c430509d619c380c56923c645cbd02f610/llama-factory/src/llmtuner/extras/callbacks.py#L162-L187

I take it from wandb integration:
https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/integration_utils.py#L824

The difference between world_process_zero and local_process_zero is explained here:
https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_callback.py#L77

seungduk-yanolja · 2024-03-29T20:29:35Z

src/axolotl/utils/vessl_.py

+
+
+def setup_vessl_env_vars(cfg: DictDefault):
+ # VESSL_RUN_INITIAL_CONFIG is a variable that contain path to


I cannot find any references explaining this variable. Can you attach a document pointing this variable?

I got it from container environment variables, will try to attach a screenshot

…o checkpoint-callback

rifqiyan added 5 commits March 26, 2024 11:28

add vessl callback

29c2c36

pass vessl credential

ebf2b36

rename to vessl and add default metrics

f780647

revert metrics filter

ecdff97

update name and module comment

5b0aa90

rifqiyan self-assigned this Mar 27, 2024

rifqiyan requested review from seungduk-yanolja and geon-yanolja March 27, 2024 06:45

seungduk-yanolja reviewed Mar 27, 2024

View reviewed changes

rifqiyan added 5 commits March 28, 2024 15:15

remove metrics from constructor

b917faa

Merge branch 'main' of https://github.com/Y-IAB/axolotl into checkpoi…

0b76c97

…nt-callback

remove use_vessl variable

74511a4

fix pre-commit

22d5f97

ignore too-many-lines

f4437b0

rifqiyan commented Mar 29, 2024

View reviewed changes

add vessl config class on AxolotlInputConfig

b89a91a

seungduk-yanolja reviewed Mar 29, 2024

View reviewed changes

rifqiyan added 2 commits April 2, 2024 12:32

apply review feedback

16d9813

apply review feedback

1f17adb

seungduk-yanolja approved these changes Apr 6, 2024

View reviewed changes

rifqiyan added 2 commits April 17, 2024 09:41

Merge branch 'main' of https://github.com/yanolja-org/iab-axolotl int…

370b9e5

…o checkpoint-callback

apply review feedback

213bc6a

rifqiyan merged commit 85564fb into main Apr 23, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#4] Add VESSL Callback to Post Metrics to VESSL AI #6

[#4] Add VESSL Callback to Post Metrics to VESSL AI #6

rifqiyan commented Mar 27, 2024 •

edited

Loading

seungduk-yanolja Mar 27, 2024

rifqiyan Mar 28, 2024

seungduk-yanolja Mar 28, 2024

rifqiyan Mar 29, 2024

seungduk-yanolja Mar 29, 2024

seungduk-yanolja Mar 27, 2024

rifqiyan Mar 28, 2024

seungduk-yanolja Mar 27, 2024

rifqiyan Mar 29, 2024

seungduk-yanolja Mar 29, 2024

seungduk-yanolja Apr 6, 2024

seungduk-yanolja Mar 29, 2024

rifqiyan Apr 2, 2024

seungduk-yanolja Mar 29, 2024

rifqiyan Apr 2, 2024

		@@ -369,6 +370,8 @@ def load_cfg(config: Union[str, Path] = Path("examples/"), **kwargs):

		setup_mlflow_env_vars(cfg)

		setup_vessl_env_vars(cfg)



		def setup_vessl_env_vars(cfg: DictDefault):
		# VESSL_RUN_INITIAL_CONFIG is a variable that contain path to

[#4] Add VESSL Callback to Post Metrics to VESSL AI #6

[#4] Add VESSL Callback to Post Metrics to VESSL AI #6

Conversation

rifqiyan commented Mar 27, 2024 • edited Loading

Changes

Screenshot

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rifqiyan commented Mar 27, 2024 •

edited

Loading