Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for implementing DB splitting #157

Open
LucienXian opened this issue May 12, 2022 · 6 comments
Open

Best practices for implementing DB splitting #157

LucienXian opened this issue May 12, 2022 · 6 comments

Comments

@LucienXian
Copy link

Problem Description

I'm trying to use rocksdb-cloud, but I'm having some issues with how to implement DB instance splitting.

Suppose the current DB key range is [0, 100]. To split the DB into 2 small DBs according to certain rules, the 2 new DBs may be [0, 50] and [51, 100]. How should I do it in a better way?

In this process, it is necessary to split as quickly as possible to reduce the time of stop writing. All I can think of is to share the sst files when splitting, the other files are quickly independent. When the new DB is opened, the WAL of the old DB is replayed, and the replay calls a DeleteRange to delete it.

But here is a question, can we control sst files to be shared in two new DBs until both DBs no longer need to use shared sst files because of subsequent compaction

Or is there a better solution?

@anqurvanillapy
Copy link

Maybe you mean "sharding"? Searching for this keyword might help better. Here's also a project that performs sharding in the application level (https://github.com/uber/ringpop-go). Or maybe go all in the Raft and multi-raft instead of Kafka in this project.

I believe this topic goes out of the scope here, because normally people shard on top of different replica groups, and you could also read that in the Rockset white paper they don't talk so much about this.

@dhruba
Copy link

dhruba commented May 17, 2022

can we control sst files to be shared in two new DBs until both DBs no longer need to use shared sst files because of subsequent compaction

if you want to share sst files between two instances, then you can disable file deletion for both the instances, and then switch on the purger at https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/cloud/cloud_env_options.h#L282. We do not use this feature in production so I am not sure whether this code path is production ready or not.

@LucienXian
Copy link
Author

can we control sst files to be shared in two new DBs until both DBs no longer need to use shared sst files because of subsequent compaction

if you want to share sst files between two instances, then you can disable file deletion for both the instances, and then switch on the purger at https://github.com/rockset/rocksdb-cloud/blob/master/include/rocksdb/cloud/cloud_env_options.h#L282. We do not use this feature in production so I am not sure whether this code path is production ready or not.

Thx! I am now trying to implement SplitDB by means of checkpoint+deleteRange, and then regularly scan the manifest of the old DB to determine whether the file is expired. Also, I will find out if the method you mentioned is suitable.

@LucienXian
Copy link
Author

@dhruba Hi!Another question I have is that when we switch on the purger, we read all the sst files from the Current Manifest, and take out all the sst files under the old DB, so as to select the files to be deleted. Is there a risk that this batch of files to be deleted may be referenced by a request or snapshot.

@dhruba
Copy link

dhruba commented May 30, 2022

Yes, agreed that a query could be accessing a file F that is not part of the manifest. The reason being that the query started when F was part of the database but a simultaneous compaction. process has removed the file from the DB.
Typically, there is a max-timeout for your queries. If your system supports queries that could execute for an hour then the purger will delete F only iff its creation time is older than an hour. @LucienXian

@cscetbon
Copy link

@LucienXian did you implement it ? You could also have just created hard links to those SSTs. That's the easiest I can think of

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants