-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Pull requests: microsoft/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module)
#6694
opened Oct 31, 2024 by
inkcherry
Loading…
Use one param coordinator for both train/inference scenarios
#6662
opened Oct 23, 2024 by
tohtana
Loading…
A faster and more memory-efficient implementation of
zero_to_fp32
#6658
opened Oct 23, 2024 by
xu-song
Loading…
Support the parallel conversion from ZeRO checkpoints to FP32/FP16/BF16 param weight
#6655
opened Oct 23, 2024 by
xylian86
Loading…
5 tasks done
add zero3 coalesced parameters fetch to zero optimization.
#6649
opened Oct 21, 2024 by
inkcherry
Loading…
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64
#6622
opened Oct 11, 2024 by
jagadish-amd
Loading…
Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models
#6553
opened Sep 18, 2024 by
gyou2021
Loading…
Change compile for pipeline module torch.compile
#6478
opened Sep 2, 2024 by
NirSonnenschein
Loading…
Unpin tests that previously used a pinned version of transformers
#6387
opened Aug 20, 2024 by
loadams
Loading…
Add DataStates-LLM: Asynchronous Checkpointing Engine Support
#5763
opened Jul 10, 2024 by
mauryaavinash95
•
Draft
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-31.