Build a full scale, high-performance LLM from scratch in Jax! We’ll cover training and inference, roofline analysis, compilation, sharding, profiling and more. You’ll leave the class comfortable in Jax and confident in your ability to design high-performance computing systems that reach near their physical limit.
Link to the Discord: https://discord.gg/2AWcVatVAw
- Build a Jax LLM Implementation From Scratch
- Analyze Single Chip Rooflines And Compilation
- Analyze Distributed Computing via Sharding
- Optimize LLM Training – what happens under the hood, rooflines, sharding
- Optimize LLM Inference – what happens under the hood, rooflines, sharding
- Deep Dive into flash, vLLM, continuous batching, etc.
- Some deep dives along the way:
- Attention, Flash Attention, vLLM, continuous batching
- ML: Quantization, Checkpointing, Data Loading, Numerics
- Practical Tips: Debugging, Overlapping Jax Kernels
- Larger scale: Goodput
- Fancy stuff: Ahead of Time Compilation
- Going deeper: shard map, pallas.
3:30PM Pacific on Wednesdays, starting 2/21/2024. See below for links
Session | Time | Link to join (or recording) | Slides | Take-Home Exercises | Summary |
---|---|---|---|---|---|
1 | 3:30PM US Pacific, 2/21/2024 | Youtube recording | slides | link | end-to-end Jax LLM |
2 | 3:30PM US Pacific, 2/28/2024 | Youtube recording | slides | link | single chip perf and rooflines |
3 | 3:30PM US Pacific, 3/13/2024 | Youtube recording | slides | link | multi chip perf and rooflines, 1 |
4 | 3:30PM US Pacific, 3/20/2024 | Youtube recording | slides | link | multi chip perf and rooflines, 1 |
5 | 3:30PM US Pacific, 3/27/2024 | Youtube recording | slides | link | attention |
6 | 3:30PM US Pacific, 4/10/2024 | Youtube recording | slides | link | optimized training |
postponed | 3:30PM US Pacific, 4/17/2024 | postponed | |||
7 | 3:30PM US Pacific, 4/24/2024 | Youtube recording | slides | link | training e2e, inference analysis |
8 | 3:30PM US Pacific, 5/01/2024 | Google Meet link |
About me: I’m Rafi Witten, a tech lead on Cloud TPU/GPU Multipod. We develop MaxText and aim to push the frontier on Perf/TCO. In 2023, we executed the "Largest ML Job" ever demonstrated in public and pioneered “Accurate Quantized Training”, a technique for training with 8-bit integers.
Contact me via Discord https://discord.gg/2AWcVatVAw