OSDP-public

Composable + Tunable = Optimal

OSDP =

A learned communication performance (GBDT) model based on real world measurements
A profiled compute performance model based on memory and latency profile for the module
A Simulator that computes latency and peak memory usage given these two performance models' The sim is needed because launching distributed jobs are too slow - this surrogate model allows the tuner to explore a larger space
A sequential model optimizer that explores the exploration space.

The core of the OSDP performance model is similar to the one used in Srifty:

Luo, L., West, P., Patel, P., Krishnamurthy, A. and Ceze, L., 2022. SRIFTY: Swift and Thrifty Distributed Neural Network Training on the Cloud. Proceedings of Machine Learning and Systems, 4, pp.833-847.

Output = A list of ShardingStrategy which corresponds to the optimal FSDP shardingstrategy given a list of modules (abstracted as a serialized list of execution information).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
.gitignore		.gitignore
README.md		README.md
erc_quick_tests.py		erc_quick_tests.py
osdp.py		osdp.py
osdp_performance_model.py		osdp_performance_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSDP-public

About

Releases

Packages

Contributors 2

Languages

Luo-Liang/OSDP-public

Folders and files

Latest commit

History

Repository files navigation

OSDP-public

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages