⚠️ This document is a work in progress
Jarasandha is a small Java library to help build an archive of records. It has very few moving parts, embraces immutability, efficient compression, buffer management and zero copy transfer. It delegates advanced functions to external services using interfaces.
It is composed of these parts:
- A file format that has blocks, records and an index
- Blocks can be compressed (optional). Blocks contain records
- The file is immutable, meaning once the file with all its records is written it cannot be modified
- The file is a "write once and read many times" format
- Checksums and compression on the internal index and blocks
- Records are written one at a time to the file using a "writer". The writer returns a logical position within the file that has to be stored in an external system
- Internally, of course the records are flushed to the file one block at a time
- The "writer" and related classes provide ways to manage collections of files and hooks to archive to external stores
- Records can be retrieved using a "reader" by providing its logical position
- It also supports iterating over the records or blocks of records in the file
- The "reader" and related classes provide efficient, selective loading and caching of blocks and files for repeated reads
- It also has hooks to read from external stores
- It is meant to be embedded inside your application that serves records from a remote archive and a local file system
- Both the reader and writer components make heavy use of Netty's Bytebuf to keep heap and in general memory usage low with a controllable budget
Jarasandha does not aim to compete with systems or libraries like Apache ORC or Apache Parquet or PalDB or embedded Key-Value stores or Ambry or Apache HBase.
- It does not provide key-value access, rather it provides a simple position based access to records
- It is not a database of any sort
- It has no opinion in terms of what you store as a record but it can compress a block that has multiple records before storing them to the file
- It does not provide querying or searching based on keys or values rather on logical positions
Store records in Jarasandha, move the files out to object stores like Amazon S3 or Minio when they are not in use.
Jarasandha can be the underlying layer that efficiently stores and retrieves records and blocks based on logical key positions. A second index layer using Lucene or RocksDB could provide a more advanced mapping from keys, labels or queries to Jarasandha's logical key positions.
Assuming that the keys and metadata to service queries are much smaller than the actual records, they can be stored onsite, on fast and expensive hardware. The actual record can then be retrieved from the Jarasandha files and blocks that are cached locally or downloaded on demand from remote object stores.
See Hot-cold store for details.
The name (Jarasandha
) is a reference to an Indian mythological character named Jarasandha who was put back together from two halves. I found the name vaguely related to this Java library which puts your records back together from blocks of compressed records in a file. Well, I did say - "vaguely related".
The Jarasandha library is licensed under the Apache License.
Read & Write
Efficiency
Example - based on Importer and FileReadersTest
Compression, blocks, memory efficiency of ByteBuf
Pre-reqs: Java 8, Maven
FileWriters
FileReaders
Files
FileId
File format
Index and block format
Logical record position, need to secondary store
Compression and caching
Writer and reader efficiency - ButeBuf
CLI importer
Writing - NoOpFileWriteProgressListener to push files to S3
Reading - DefaultFileEventListener to build archiving and retrieval