-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster auto-scaling best practices #40
Comments
@tudordumitriu I think this is outside the scope of the Helm chart, though I can see how it's related. I would be extremely wary of using an autoscaler to automate database cluster scaling - the chance of data loss is high and the timescales and nuances involved in moving shards around likely require manual supervision anyway. In the general case, this is a difficult problem, which is probably why there are no community tools to address it (regardless of Kubernetes). One tool you could look at is couchdb-admin from Cabify. I haven't used it personally but it looks to automate at least some of the management tasks. Unfortunately, the process for moving shards described in the docs is tricky in Kubernetes because not many storage backends support Adding a node would require something like:
Removing a node from the cluster would be a similar process, updating the shard map to ensure enough shard replicas exist on the remaining nodes before removing it. |
Thank you @willholley! Truly appreciate it Sorry for not being 100% within scope, but since the final goal is to deploy it within a cluster it made some sense to address it here (and honestly didn't know somewhere else to go). When time comes (loose terms warning):
I still have some questions (some maybe out of scope as well):
Thanks again! |
you need to add the node to the cluster (step 6) before putting it in MM and moving shards around. Regarding the questions:
Multiples of 3 is usually simplest because shard distribution is then equal, but it's not required.
Yes - no problems with the k8s service loadbalancer (it's just IPTables/IPVS). If you expose CouchDB to the outside using an Ingress the performance will depend on which Ingress implementation you use etc.
Yes - there's not really any benefit in having more than one CouchDB node per worker. The only exception I can think of is that you could "oversize" the cluster initially and then spread out CouchDB nodes amongst machines as you grow without needing to go through the cluster expansion steps described above, assuming you use remote storage. |
Thanks again! |
Hi, mostly thinking out loud here, but would the following be a valid scaling strategy?
This way, there is no need for resharding. However (and please note I am a k8s beginner), I don't think this "migration" of pods to other nodes when their resource allocations change would be automatic, so it would probably require killing the pods to force them being recreated elsewhere. EDIT: just realized that changing the resource requests of pods according to actual usage and migrating them to other k8s nodes is the Vertical Pod Autoscaler's job, so it seems scaling could be achieved by implementing point 1 above and properly configuring a Vertical Pod Autoscaler (and a Cluster Autoscaler). |
Hi guys
I think this chart is very useful in a k8s cluster but (and maybe it's just me missing out stuff) I think the community/this chart is missing some support / best practices for auto-scaling (increasing the number of couchdb nodes inside a k8s cluster)
It's quite clear that all of us want to deploy within a k8s cluster, in a cloud, because we can scale out (and in) based on various metrics.
We have been working on a azure aks setup with SSDs as data storage support for couchdb and our business services.
Now, what the stress testing revealed was the fact that the couchdb, in our case at least, is using intensively the CPU and we do want to get prepared for such bursts but in a automated way.
The obvious solution is to use Cluster Autoscaler + Horizontal Pod Autoscaler, so that we can add (and remove) a new node and a new pod (pods), on demand.
But ,the problem is (and this is here where I might be wrong) is that the couchdb cluster needs to be updated manually.
More than that, if we do have a big amount of data, how do we properly set up the new node to be "warm" when is added in the cluster (meaning replicating physically the data drive if that's even an option, so that the cluster itself won't sync internally which from our experiments seem to use quite some resources).
I did go through the couchdb docs, couch helm chart files, various documentation sources and I wasn't able to find any automated way of doing this.
We are setting up the cluster via calls to the http /_cluster_setup endpoint which is fine if we do it manually, but if the autoscaling happens automatically, the new node would be basically of no use until is added to cluster, manually.
So, if possible, pls share with us any best practices or even mechanisms that could help automate this job.
Thanks
The text was updated successfully, but these errors were encountered: