Investigate DeepSpeed Inference #845

Quentin-Anthony · 2023-03-21T13:02:15Z

DeepSpeed wins most inference benchmarks I see. We should test their claims on neox models. EleutherAI spends a significant amount of compute running inference, so any improvement in inference performance would be high-impact. What I would like to see are:

Take baseline inference latency on a small/medium/large (160M, 6.9B, 20B) neox models with gpt-neox.
A summarization on how DeepSpeed's inference engine can benefit us (https://arxiv.org/abs/2207.00032, https://www.deepspeed.ai/tutorials/inference-tutorial/, https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/inference-tutorial.md)
Take new numbers on the above models with DeepSpeed inference, compare with baseline, write up a short README and include numbers + run instructions for DS inference.

cr458 · 2023-03-26T21:43:56Z

would be more than happy to take this on, are there any resources available for these tests or is it up to me to find them?

StellaAthena · 2023-03-27T02:15:12Z

would be more than happy to take this on, are there any resources available for these tests or is it up to me to find them?

If by “resources” you mean computing resources, yes we can easily make GPUs available for testing this PR.

cr458 · 2023-03-27T02:41:40Z

Hi Stella, thanks for your reply. Yup that's exactly what I meant, sorry about the poor wording. Great, in that case I'd love to take this on. Let me know how you'd like to arrange access to the GPUs and I'll get going!

satpalsr · 2023-03-28T16:36:17Z

+1

StellaAthena · 2023-05-15T16:40:38Z

@cr458 @satpalsr What's the current status of this?

IshanMi · 2023-12-24T01:02:02Z

Hey @Quentin-Anthony - I would love the chance to work on this!

Closes EleutherAI#845

yang · 2024-01-25T03:14:27Z

Some initial numbers are promising. With the current configs/125M.yml and text_generation.yml, I see some pretty consistent numbers of ~2.4s duration_seconds go down to ~1.4s, on a single-GPU node (A10G). Will share more numbers once I can get compute.

Quentin-Anthony added feature request New feature or request good first issue Good for newcomers labels Mar 21, 2023

Quentin-Anthony assigned cr458 Mar 28, 2023

Quentin-Anthony assigned satpalsr Mar 28, 2023

cr458 mentioned this issue Apr 11, 2023

Deepspeed benchmarking #878

Draft

Quentin-Anthony assigned haileyschoelkopf and unassigned cr458 and satpalsr Dec 19, 2023

StellaAthena assigned IshanMi Dec 24, 2023

yang added a commit to yang/gpt-neox that referenced this issue Jan 25, 2024

Add DS inference

6e3f22b

Closes EleutherAI#845

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate DeepSpeed Inference #845

Investigate DeepSpeed Inference #845

Quentin-Anthony commented Mar 21, 2023

cr458 commented Mar 26, 2023

StellaAthena commented Mar 27, 2023

cr458 commented Mar 27, 2023

satpalsr commented Mar 28, 2023

StellaAthena commented May 15, 2023

IshanMi commented Dec 24, 2023 •

edited

yang commented Jan 25, 2024

Investigate DeepSpeed Inference #845

Investigate DeepSpeed Inference #845

Comments

Quentin-Anthony commented Mar 21, 2023

cr458 commented Mar 26, 2023

StellaAthena commented Mar 27, 2023

cr458 commented Mar 27, 2023

satpalsr commented Mar 28, 2023

StellaAthena commented May 15, 2023

IshanMi commented Dec 24, 2023 • edited

yang commented Jan 25, 2024

IshanMi commented Dec 24, 2023 •

edited