Slides 07/2022, covers
- 2017 Google Transformer
- 2018 GLUE/SuperGLUE
- 2018 Google BERT
- 2018 OpenAI GPT-1
- 2018 OpenAI GPT-2
- 2019 Google T5
- 2020 OpenAI GPT-3
- 2020 HuggingFace decoding algorithms
- 2021 OpenAI Codex
- 2021 OpenAI Math paper
- 2021 DeepMind Gopher
- 2021 Google&Others Big-Bench
- 2022 OpenAI ML Parallelism guide
- 2022 OpenAI InstructGPT
- 2022 DeepMind AlphaCode
- 2022 Google LaMDA
- 2022 Google PaLM
- 2022 DeepMind Chinchilla
- 2022 Google Minerva (pathways)
- 2022 Salesforce CodeRL
Slides 06/2022, covers
- Learned embedding like BERT
- Sinusoidal pos embedding like vanilla Transformer
- Relative position embedding
- Rotary position embedding (RoPE)
V2 'Fun' version slides 04/2022
-
year/id Title 2014 Dropout: A Simple Way to Prevent Neural Networks from Overfitting 1412.6980 Adam: A Method for Stochastic Optimization 1502.01852 Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification 1503.02531 Distilling the Knowledge in a Neural Network 1502.03167 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 2016 Deep Neural Networks for YouTube Recommendations -
year/id Title 1301.3781 Efficient Estimation of Word Representations in Vector Space 1409.3215 Sequence to Sequence Learning with Neural Networks 1706.03762 Attention Is All You Need 1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 1804.07461 GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 1910.10683 Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 2005.14165 Language Models are Few-Shot Learners