-
Notifications
You must be signed in to change notification settings - Fork 80
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
17483db
commit a738d62
Showing
2 changed files
with
144 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Distributed Failures\n", | ||
"\n", | ||
"* Computers are designed to fail all at once and catastrophically.\n", | ||
"* Distributed services are fundamentally different. They need to fail only partially, if possible, in order to allow as many services as possible to continue to operate.\n", | ||
"* This unlocks a lot of additional failure modes, **partial failures**, which are both hard to understand and hard to reason about.\n", | ||
"\n", | ||
"\n", | ||
"* Long discussion of network flakiness...\n", | ||
"\n", | ||
"\n", | ||
"* Modern computers have both a time-of-day clock and a monotonic clock. The former is a user-facing clock that can move backwards or stop in time. The latter is a constantly forward-propagating clock, but only the difference between values is meaningful; its absolute magnitude is meaningless.\n", | ||
"* Both clocks are ruled over by NTP synchronization, when the computer is online.\n", | ||
"* These clocks are quartz-based, and subject to approximately 35 ms average errors.\n", | ||
"* Furthermore, processes may pause for arbitrarily long periods of time due to things like garbage collection and context switches.\n", | ||
"* Things that depend on timeouts, like creating a global log in a distriobuted system (databases) or expiring leader-given leases (GFS), must account for this possibility.\n", | ||
"* You can omit these delays if you try hard enough. Systems like flight controls and airbags operate are **hard real-time systems** that do so. But these are really, really hard to build, and highly limiting.\n", | ||
"\n", | ||
"\n", | ||
"* Algorithms can be proven to work correctly in a given an unreliable **system model**. But you have to choose your model carefully.\n", | ||
"* A point that Jespen makes: these algorithms are hard to implement in a durable way. The theory is far ahead of the practice, but a lot of architects try to homegrow their conflict resolution, which is problematic.\n", | ||
"\n", | ||
"\n", | ||
"* One simplifying assumption we make when designing fault tolerant systems is that nodes and services never purposefully lie.\n", | ||
"* If a node sends an knowingly untrue message to another node, this is known as a **Byzantine fault**. \n", | ||
"* Byzantine faults must be addressed in the design of public distributed systems, like the Internet or blockchain. But they can be ignored in the design of a service system, as obviously you will not purposefully lie to yourself.\n", | ||
"\n", | ||
"\n", | ||
"* In designing a distributed system, one that is protected from Byzantine faults, you often need to design around causal dependency. You need to understand what a node knew when it made a decision, as that knowledge informs how you will heal any resulting divergence.\n", | ||
"* **Vector clocks** are an algorithm for this (e.g. Dynamo).\n", | ||
"* An alterantive is preserving a **total ordering** of operations (e.g. MongoDB)." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |