Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we support versioning? #15

Open
archenroot opened this issue Dec 19, 2018 · 5 comments
Open

Can we support versioning? #15

archenroot opened this issue Dec 19, 2018 · 5 comments
Labels
question Need more discussion

Comments

@archenroot
Copy link

I would like to support key value versioning capability.

There could be introduced something like revision id (incremental number per key 0-infinity) and at the same time I am interested in tracking key (validFrom/createdOn - validTo/InvalidatedOn).

@ahasani
Copy link

ahasani commented Dec 19, 2018

As with compound key, you can append version number to the key. Now here is the catch: to get the latest version you need to do prefix scan which HaloDB does not support or to iterate. HaloDB is not quite fast at iterating over value. HaloDB excels at random search.

@archenroot
Copy link
Author

@ahasani I think I will look inside how to implement this capability within the existing API. On top there are these additional attributes:
validFrom - you can insert a key which is valid from different date then one of inserted into store
createdOn - mark of creation
validTo - value is not valid after this period of time
InvalidatedOn - equeal to createdOn of next revision or today (if just invalidated command is issued)

Ladislav

@archenroot
Copy link
Author

@ahasani - what about full text search on VALUE part? I know its bit against standard usage, but in general show me all keys with which includes value. I think in such case I must implement this, correct? Its fine it will be slow. Maybe we can use Solr or Lucene engines for this.

@ahasani
Copy link

ahasani commented Dec 19, 2018

Hi @archenroot, on validFrom/createdOn - validTo/InvalidatedOn, I have asked @amannaly to implement timestamp on header as such that we can iterate over it, this is on validFrom/createdOn - validTo/InvalidatedOn as issue #9. On full text search you are spot on by using lucene not too hard too implement.
IMHO on HaloDB, you have to understand the very SPECIFIC use case of HaloDB

HaloDB was written for a high-throughput, low latency distributed key-value database that powers multiple ad platforms at Yahoo, therefore all its design choices and optimizations were primarily for this use case.

First think of Oath (formerly Yahoo) they have big boxes with a lots of memory so they design all in memory metadata, which already limit the use case for "commodity" hw or small cloud instance.
Second the design choice of strictly paging write for throughput allow data loss is tolerable since their implementation sit behind kafka as the 1st persistence and messaging layer.
Third they choose single threaded writer.
And forth they choose not to use sorting and range scan.

#2 is done by having durability option, #3 is easy to overcome with multiple instances or queue/disruptor (yes HaloDB is that fast) #4 we can have another layer of sortedindex. But #1 is hard as it is the "main" feature of HaloDB.

I have high respect for HaloDB and it is a very positive contribution from Oath and @amannaly for us and community. Kindly appreciated.

Also please have a look at RockDB java or H2 MVStore. But i am against using these kind of store for any big value (anything bigger than 1mb) for write/space amplification . Even putting these 2 and/or lucene complementing HaloDB as index could be a better choice, similar to wisckey / badger

Sorry for TL;DR :-) i love HaloDB such an inspiration.

Cheers

@archenroot
Copy link
Author

archenroot commented Dec 19, 2018

@ahasani - thx for comprehensive answer! I look for gigabytes to be stored in my case, but if I understand it correctly HaloDB keeps in memory only indexes, correct? We don't have commodity hardware here, so having 512/1TB ram not an issue.

Regarding header timestamp I noticed that issue, thx for reference.

Regarding #2 and #3 and #4 - good reading, thx.

Regarding #1 I understand - design case.

Kafka usage, I see, so it will require to do something like this to expose HaloDB to others and with secured persistence:
creatediagram 1

Right? So each READ instance has its own copy.

NOTE: It could be also implemented as READ and WRITE services are one service and Kafka Embedded.

@wangtao724 wangtao724 added the question Need more discussion label Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Need more discussion
Projects
None yet
Development

No branches or pull requests

3 participants