A minimal implementation of reverse-mode automatic differentiation (a.k.a. autograd / backpropagation) in pure Python.
Inspired by Andrej Karpathy's micrograd, but with more comments and less cleverness. Thanks for the wonderful reference implementation and tests!
Create a Scalar
.
a = Scalar(1.5)
Do some calculations.
b = Scalar(-4.0)
c = a**3 / 5
d = c + (b**2).relu()
Compute the gradients.
d.backward()
Plot the computational graph.
draw_graph(d)
demo.ipynb
: Demo notebook of MiniGrad's functionality.tests.ipynb
: Test notebook to verify gradients against PyTorch and JAX. Install both to run tests.minigrad/minigrad.py
: The entire autograd logic in one (~100 loc) numeric class. See section below for details.minigrad/visualize.py
: This just draws nice-looking computational graphs. Install Graphviz to run it.requirements.txt
: MiniGrad requires no external modules to run. This file just sets up my dev environment.
MiniGrad is implemented in one small (~100 loc) Python class, using no external modules.
The entirety of the auto-differentiation logic lives in the Scalar
class in minigrad.py
.
A Scalar
wraps a float/int and overrides its arithmetic magic methods in order to:
- Stitch together a define-by-run computational graph when doing arithmetic operations on a
Scalar
- Hard code the derivative functions of arithmetic operations
- Keep track of
∂self/∂parent
between adjacent nodes - Compute
∂output/∂self
with the chain rule on demand (when.backward()
is called)
This is called reverse-mode automatic differentiation. It's great when you have few outputs and many inputs, since it computes all derivatives of one output in one pass. This is also how TensorFlow and PyTorch normally compute gradients.
(Forward-mode automatic differentiation also exists, and has the opposite advantage.)
This project is just for fun, so the following are not planned:
- Vectorization
- Higher order derivatives (i.e.
Scalar.grad
is aScalar
itself) - Forward-mode automatic differentiation
- Neural network library on top of MiniGrad