Future Plan of Transformer Kernel #600

hxbloom · 2020-12-14T08:45:37Z

I'm using Megatron-LM example to train GPT-2 on my cluster. I've also tested the DeepSpeed Transformer Kernel in the bing_bert example, it is really helpful, much faster than the original PyTorch version with less memory consumption.

I would like to know if you have any future plan to extend the transformer kernel, for example, support more models like GPT-2, or integrate model parallel into the kernel for large model training?

Thanks!

RezaYazdaniAminabadi · 2020-12-14T17:56:52Z

Hi Dong,

Thanks for pointing out the use-cases for Transformer kernel. There us a plan for supporting other types of transformer networks. For the GPT2, we have the kernels modified to support that, and we obtained about 30% and 40% speedup in the forward and backward pass. We are going to release them very soon. Please stay tuned!

Thanks,
Reza

hxbloom · 2020-12-18T09:50:34Z

Look forward to your next release!

Dong

hxbloom closed this as completed Dec 18, 2020

StellaAthena mentioned this issue Jan 10, 2021

Add Deepspeed Transformer Kernel EleutherAI/gpt-neox#43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future Plan of Transformer Kernel #600

Future Plan of Transformer Kernel #600

hxbloom commented Dec 14, 2020

RezaYazdaniAminabadi commented Dec 14, 2020

hxbloom commented Dec 18, 2020

Future Plan of Transformer Kernel #600

Future Plan of Transformer Kernel #600

Comments

hxbloom commented Dec 14, 2020

RezaYazdaniAminabadi commented Dec 14, 2020

hxbloom commented Dec 18, 2020