High Performance LLMs 2024

Build a full scale, high-performance LLM from scratch in Jax! We’ll cover training and inference, roofline analysis, compilation, sharding, profiling and more. You’ll leave the class comfortable in Jax and confident in your ability to design high-performance computing systems that reach near their physical limit.

Link to the Discord: https://discord.gg/2AWcVatVAw

Syllabus. We will:

Build a Jax LLM Implementation From Scratch
Analyze Single Chip Rooflines And Compilation
Analyze Distributed Computing via Sharding
Optimize LLM Training – what happens under the hood, rooflines, sharding
Optimize LLM Inference – what happens under the hood, rooflines, sharding
Deep Dive into flash, vLLM, continuous batching, etc.
Some deep dives along the way:
- Attention, Flash Attention, vLLM, continuous batching
- ML: Quantization, Checkpointing, Data Loading, Numerics
- Practical Tips: Debugging, Overlapping Jax Kernels
- Larger scale: Goodput
- Fancy stuff: Ahead of Time Compilation
- Going deeper: shard map, pallas.

Approximate Timing

3:30PM Pacific on Wednesdays, starting 2/21/2024. See below for links

Session Timing, Slides, Videos and Take-Home Exercises

Session	Time	Link to join (or recording)	Slides	Take-Home Exercises	Summary
1	3:30PM US Pacific, 2/21/2024	Youtube recording	slides	link	end-to-end Jax LLM
2	3:30PM US Pacific, 2/28/2024	Youtube recording	slides	link	single chip perf and rooflines
3	3:30PM US Pacific, 3/13/2024	Youtube recording	slides	link	multi chip perf and rooflines, 1
4	3:30PM US Pacific, 3/20/2024	Youtube recording	slides	link	multi chip perf and rooflines, 1
5	3:30PM US Pacific, 3/27/2024	Youtube recording	slides	link	attention
6	3:30PM US Pacific, 4/10/2024	Youtube recording	slides	link	optimized training
postponed	3:30PM US Pacific, 4/17/2024	postponed
7	3:30PM US Pacific, 4/24/2024	Youtube recording	slides	link	training e2e, inference analysis
8	3:30PM US Pacific, 5/01/2024	Google Meet link

About me: I’m Rafi Witten, a tech lead on Cloud TPU/GPU Multipod. We develop MaxText and aim to push the frontier on Perf/TCO. In 2023, we executed the "Largest ML Job" ever demonstrated in public and pioneered “Accurate Quantized Training”, a technique for training with 8-bit integers.

Contact me via Discord https://discord.gg/2AWcVatVAw

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
s01		s01
s02		s02
s03		s03
s04		s04
s05		s05
s06		s06
s07		s07
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

High Performance LLMs 2024

Syllabus. We will:

Approximate Timing

Session Timing, Slides, Videos and Take-Home Exercises

About

Releases

Packages

Languages

RaghavM11/HighPerfLLMs2024

Folders and files

Latest commit

History

Repository files navigation

High Performance LLMs 2024

Syllabus. We will:

Approximate Timing

Session Timing, Slides, Videos and Take-Home Exercises

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages