Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim? #100

chokevin8 · 2024-07-01T02:55:55Z

I wonder if anyone has tried an implementation of FSDP, it would help train larger Vim models for larger datasets since FSDP will shard the models and its parameters across nodes/GPUs as well, while DDP doesn't. I am aware that FSDP is specifically optimized for Transformers, so I was wondering if anyone has an implementation or knows of one. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim? #100

Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim? #100

chokevin8 commented Jul 1, 2024

Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim? #100

Has anyone tried utilizing FSDP (Fully Sharded Data Parallel) for Vim? #100

Comments

chokevin8 commented Jul 1, 2024