The following is some of the most useful research material I found while building toyDB. It is a subset of my reading list.
Andy Pavlo's CMU lectures are absolutely fantastic as an introduction to database internals:
- 🎥 CMU 15-445 Intro to Database Systems (A Pavlo 2019)
- 🎥 CMU 15-721 Advanced Database Systems (A Pavlo 2020)
Martin Kleppman has written an excellent overview of database technologies and concepts, while Alex Petrov goes more in depth on implementation of storage engines and distributed systems algorithms:
- 📖 Designing Data-Intensive Applications (M Kleppmann 2017)
- 📖 Database Internals (A Petrov 2019)
The Raft consensus algorithm is well described in the very readable original paper by Diego Ongaro, and in a talk given by his advisor John Ousterhout:
- 📄 In Search of an Understandable Consensus Algorithm (D Ongaro, J Ousterhout 2014)
- 🎥 Designing for Understandability: The Raft Consensus Algorithm (J Ousterhout 2016)
However, implementing Raft can be a bit tricky due to some subtle details, so Jon Gjengset's student guide was very helpful in drawing attention to common pitfalls:
- 🔗 Students' Guide to Raft (J Gjengset 2016)
Although using Go, not Rust, Thorsten Ball has written a very enjoyable hands-on introduction to parsers where he implements first an interpreter and then a compiler for the made-up Monkey programming language:
- 📖 Writing An Interpreter In Go (T Ball 2016)
- 📖 Writing A Compiler In Go (T Ball 2018)
The toyDB expression parser is inspired by a blog post by Eli Bendersky describing the precedence climbing algorithm, which is the algorithm I found the most elegant:
- 💬 Parsing Expressions by Precedence Climbing (E Bendersky 2012)
Jepsen (i.e. Kyle Kingsbury) has an excellent overview of consistency and isolation models, which is very helpful in making sense of the jungle of overlapping and ill-defined terms:
- 🔗 Consistency Models (Jepsen 2016)
For more background on this, in particular on how snapshot isolation provided by the MVCC transaction engine used in toyDB does not fit into the traditional SQL isolation levels, the following classic papers were useful:
- 📄 A Critique of ANSI SQL Isolation Levels (H Berenson et al 1995)
- 📄 Generalized Isolation Level Definitions (A Adya, B Liskov, P ONeil 2000)
As for actually implementing MVCC, I found blog posts to be the most helpful:
- 💬 Implementing Your Own Transactions with MVCC (E Chance 2015)
- 💬 How Postgres Makes Transactions Atomic (B Leach 2017)