Skip to content

adrukh/rax

 
 

Repository files navigation

Rax, an ANSI C radix tree implementation

Rax is a radix tree implementation initially written to be used in a specific place of Redis in order to solve a performance problem, but immediately converted into a stand alone project to make it reusable for Redis itself, outside the initial intended application, and for other projects as well.

The primary goal was to find a suitable balance between performances and memory usage, while providing a fully featured implementation of radix trees that can cope with many different requirements.

During the development of this library, while getting more and more excited about how practical and applicable radix trees are, I was very surprised to see how hard it is to write a robust implementation, especially of a fully featured radix tree with a flexible iterator. A lot of things can go wrong in node splitting, merging, and various edge cases. For this reason a major goal of the project is to provide a stable and battle tested implementation for people to use and in order to share bug fixes. The project relies a lot on fuzz testing techniques in order to explore not just all the lines of code the project is composed of, but a large amount of possible states.

Rax is an open source project, released under the BSD two clause license.

Major features:

  • Memory conscious:
    • Packed nodes representation.
    • Able to avoid storing a NULL pointer inside the node if the key is set to NULL (there is an isnull bit in the node header).
    • Lack of parent node reference. A stack is used instead when needed.
  • Fast lookups:
    • Edges are stored as arrays of bytes directly in the parent node, no need to access non useful children while trying to find a match. This translates into less cache misses compared to other implementations.
    • Cache line friendly scanning of the correct child by storing edges as two separated arrays: an array of edge chars and one of edge pointers.
  • Complete implementation:
    • Deletion with nodes re-compression as needed.
    • Iterators (including a way to use iterators while the tree is modified).
    • Random walk iteration.
    • Ability to report and resist out of memory: if malloc() returns NULL the API can report an out of memory error and always leave the tree in a consistent state.
  • Readable and fixable implementation:
    • All complex parts are commented with algorithms details.
    • Debugging messages can be enabled to understand what the implementation is doing when calling a given function.
    • Ability to print the radix tree nodes representation as ASCII art.
  • Portable implementation:
    • Never does unaligned accesses to memory.
    • Written in ANSI C99, no extensions used.
  • Extensive code and possible states test coverage using fuzz testing.
    • Testing relies a lot on fuzzing in order to explore non trivial states.
    • Implementation of the dictionary and iterator compared with behavior-equivalent implementations of simple hash tables and sorted arrays, generating random data and checking if the two implementations results match.
    • Out of memory condition tests. The implementation is fuzzed with a special allocator returning NULL at random. The resulting radix tree is tested for consistency. Redis, the primary target of this implementation, does not use this feature, but the ability to handle OOM may make this implementation useful where the ability to survive OOMs is needed.
    • Soon to be part of Redis: the implementation will be stressed significantly in the real world.

The layout of a node is as follows. In the example, a node which represents a key (so has a data pointer associated), has three children x, y, z. Every space represents a byte in the diagram.

+----+---