A multi-threaded distributed system emulator. It has been developed as a platform for testing and analyzing a novel Fault-Tolerant, Distributed Termination Detection algorithm based on the algorithm by Shmuel Safra.
An implementation of the fault-tolerant algorithm has been applied to a fault-tolerant version of the Chandy-Misra routing algorithm as well as a fault-tolerant version of the Afek-Kutten-Yung self-stabilizing spanning tree algorithm. The implementation can be found here.
More:
[1] Georgios Karlos, Wan Fokkink, Per Fuchs - Fault-Tolerant Termination Detection with Safra's Algorithm
[2] Georgios Karlos - Emulation of a Fault-Tolerant Variant of Safra's Termination Detection Algoritm
[3] Edsger W. Dijkstra - Shmuel Safra's Version of Termination Detection
- Build
$ ant jar
- Run
$ java -jar TDS-0.1.jar [OPTION]...
Option | Description | Possible Parameters | Default |
---|---|---|---|
-h | Print help | - | off |
-v | Verbose output | - | off |
-ver | Algorithm version | 1 (OFSS), 2 (IFSS), 3 (FTS), 0 (All) | 0 |
-n | Number of nodes | [2, 1024] | 4 |
-c | Number of crashing nodes | -c : random -c low high: crash nodes in 'low' to 'high' interval |
0 |
-l | Activity level | 1, 2 | 1 |
-w | Max wait time for a run (ms) | >0 | 100000 |
-csv | Write performance metrics to a csv file | - | off |
-f | Clean (flush) the csv (if one exists with a name corresponding to this run's params) | - | off |
-dist | Choose a probability distribution by which random activity is determined | -(1) : Uniform -(2) : Gaussian |
|
-strategy | Choose a strategy by which activity is happening (random under chose distribution), when a node becomes active | -(1) : Compute-Send - Perform some computation and then send some messages (uniform 0-3, gaussian mean=1 sd=1) -(2) : N-Activities - Choose an N under given distribution. For each of the n activities randomly choose if computation of message |
1 |
-batype | Basic Algorithm type | -(1) : Centralized (Node 0 is initially active) -(2) : Decentralized-Even (Nodes with even id's are initially active) -(3) : Decentralized-Random (A random number of nodes, uniformly chosen, is initially active) |
2 |
-ci | Interval between 2 consecutive node crashes | -(1) : Uniform in [200, 3000] ms -(2) : Gaussian with mean 1000ms and sd 200ms |
1 |
-anl | Average network latency | Value in [20, 100] (Message delays are random with Gaussian distribution with mean -avl and standard deviation of 10 ms | 60 |
This is a typical output when running with option -v and default -ver
[1] There is currently a bug introduced after the migration. When all versions run at the same time, occasionally, one will hang with no activity taking place for that version.