Skip to content

yejingxin/HighPerfLLMs2024

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High Performance LLMs 2024

Build a full scale, high-performance LLM from scratch in Jax! We’ll cover training and inference, roofline analysis, compilation, sharding, profiling and more. You’ll leave the class comfortable in Jax and confident in your ability to design high-performance computing systems that reach near their physical limit.

Link to the Discord: https://discord.gg/2AWcVatVAw

Topics Covered

  • Build a high performance Jax LLM Implementation for training
  • Build a high performance Jax LLM Implementation for serving
  • Analyze Single Chip Rooflines And Compilation
  • Analyze Distributed Computing via Sharding
  • Optimize LLM Training – what happens under the hood, rooflines, sharding
  • Optimize LLM Inference – what happens under the hood, rooflines, sharding
  • Deep Dive into attention especialy fused attention schedules, running softmax and flash attention
  • Pallas (optimize one lever deeper!)

Approximate Timing

3:30PM Pacific on Wednesdays, starting 2/21/2024. See below for links

Session Timing, Slides, Videos and Take-Home Exercises

Session Time Link to join (or recording) Slides Take-Home Exercises Summary
1 3:30PM US Pacific, 2/21/2024 Youtube recording slides link end-to-end Jax LLM
2 3:30PM US Pacific, 2/28/2024 Youtube recording slides link single chip perf and rooflines
3 3:30PM US Pacific, 3/13/2024 Youtube recording slides link multi chip perf and rooflines, 1
4 3:30PM US Pacific, 3/20/2024 Youtube recording slides link multi chip perf and rooflines, 1
5 3:30PM US Pacific, 3/27/2024 Youtube recording slides link attention
6 3:30PM US Pacific, 4/10/2024 Youtube recording slides link optimized training
7 3:30PM US Pacific, 4/24/2024 Youtube recording slides link training e2e, inference analysis
8 3:30PM US Pacific, 5/08/2024 Youtube recording slides link training xprof, mfu, naive inference
9 3:30PM US Pacific, 5/22/2024 Youtube recording slides link efficient inference, numerics
10 3:30PM US Pacific, 5/29/2024 Youtube recording slides link Pallas with Sharad Vikram!

(Session 10 was the last session! Thank you to everyone who joined us!)

About me: I’m Rafi Witten, a tech lead on Cloud TPU/GPU Multipod. We develop MaxText and aim to push the frontier on Perf/TCO. In 2023, we executed the "Largest ML Job" ever demonstrated in public and pioneered “Accurate Quantized Training”, a technique for training with 8-bit integers.

Contact me via Discord https://discord.gg/2AWcVatVAw

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%