GoCask
Go implementation of Bitcask - A Log-Structured Hash Table for Fast Key / Value Data as defined per this paper and with help from this repo.
A learning venture into database development.
Special thanks go to the amazing Ben Johnson for pointing me in the right direction and being as helpful as he was.
Features (as defined by the paper+)
- Low latency per item read or written
- High throughput, especially when writing an incoming stream of random items
- Ability to handle datasets much larger than RAM w/o degradation
- Crash friendliness, both in terms of fast recovery and not losing data
- Ease of backup and restore
- A relatively simple, understandable (and thus supportable) code structure and data format
- Predictable behavior under heavy access load or large volume
- Data files are rotated based on the user defined data file size (2GB default)
- A license that allowed for easy use
- Data corruption crc check
Important notes
- GoCask does not implement any buffer cache in-memory. Instead, it depends on the filesystem’s cache. Adjusting the caching characteristics of your filesystem can impact performance.
- GoCask stores all keys in memory which means that your system needs to have enough RAM to store all of your keyspace
How to Use/Run
There are two ways to use gocask
Using gocask as a library (embedded db) in your own app
GoCask can be used similarly to bolt or badger as an embedded db.
go get github.com/aneshas/gocask/cmd/gocask
and use the api. See the docs
Running as a standalone process
If you have go installed:
go install github.com/aneshas/gocask/cmd/gocask@latest
go install github.com/aneshas/gocask/cmd/gccli@latest
Run db server
Then run gocask
which will run the db engine itself, open default
db and start grpc (twirp) server on localhost:8888
(Run gocask -help
to see config options and the defaults)
Interact with server via cli
While the server is running you can interact with it via gccli
binary:
gccli keys
- list stored keys
gccli put somekey someval
- stores the key value pair
gccli get somekey
- retrieves the value stored under the key
gccli del somekey
- deletes the value stored under the key
gccli
is just meant as a simple probing tool, and you can generate your own client you can use the .proto definition included (or use the pre generated go client.
If you don't have go installed, you can go to releases download latest release and go through the same process as above.
Still to come
Since the primary motivation for this repo was learning more about how db engines work and although it could already be used, it's far from production ready. With that being said, I do plan to maintain and extend it in the future.
Some things that are on my mind:
- Support for multiple processes and write locking
- Current key deletion is a soft delete (implement merging and hint files)
- Fold over keys
- Double down on tests (fuzz?)
- Add benchmarks
- Make it distributed
- An eventstore spin off (use gocask instead of sqlite)