Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure virtual server only on master but not on backups #2389

Open
bleve opened this issue Feb 27, 2024 · 6 comments
Open

Configure virtual server only on master but not on backups #2389

bleve opened this issue Feb 27, 2024 · 6 comments

Comments

@bleve
Copy link

bleve commented Feb 27, 2024

I'm run tens of small clusters where we handle ipvs with ldirectord currently but keepalived would look much better solution for two-node cluster if we could set up ipvs load balancer (we use ipvs-dr) on master node only.

I'd need a way to have virtual_server configuration only on master node which is according my reading of documentation and testing not possible now.

Currently my only alternative is to run ldirectord for virtual-server, that I can run on master node only.

Feature is relevant to every small configuration where only two servers are needed without separate load balancer.

Keepalived v2.2.8 (04/04,2023), git commit v2.2.7-154-g292b299e+

My initial idea was to add possibility to add virtual_server to vrrp_sync_group to bind them together.

@pqarmitage
Copy link
Collaborator

The way keepalived is expected to work in this scenario (and this is why keepalived handles both VRRP and IPVS) is that the virtual_servers are configured on both master and all the backup instances, but the IP addresses of the virtual_servers are configured as VIPs for the VRRP instances.

A virtual server will only receive any traffic if the VRRP instance which has the IP address of the virtual server as one of its VIPS is in master state. When the VRRP instance handling the IP address of the virtual server is in backup state, the virtual server address will not be configured on that system, and so the virtual server on that system will not receive any traffic so long as the VRRP instance remains in backup state.

In other words, although the virtual servers remain configured on the backup instances, it is as though they are not configured since that cannot receive any traffic.

Depending on what IPVS forwarding method you are using, you may need to configure VRRP instances on your internal interfaces (in order to handle the packets returning from the real servers), and in this case you would need to link the VRRP instances for the external interface and the internal interface into a vrrp_sync_group as you mention (this is only necessary if using NAT forwarding, since for TUN and DR the return packets do not need to be modified, and hence are not expected to be routed via the machine running IPVS (see http:https://www.linuxvirtualserver.org/VS-NAT.html, http:https://www.linuxvirtualserver.org/VS-IPTunneling.html and http:https://www.linuxvirtualserver.org/VS-DRouting.html for more details).

@bleve
Copy link
Author

bleve commented Mar 1, 2024

I know how documentation states the design. We have many two-server klusters doing something dedicated. And here load balancer is one of the backends. So both master and backup server are doing the actual service, there is no dedicated load balancer kluster because that would be total overkill.

When IPVS is applied on both servers, packets directed from master server hit director on backup server and about half of the packets go to ping-pong between servers and those are never served for clients.

Solution for two-node kluster is simple. When you only apply IPVS director rules on a server with master state, both master and backup server can still run the service, for example https service and responding to clients.

This is not a new setup. This was originally described by Neil Horman in his IPVS documentation nearly 20 years ago and we have been running this very successfully since.

I found a way to do this with keepalived but it is quite a kludge.

That is done by running normal keepalived service to provide only vrrpp for VIP ip addresses.
And then second [email protected] which is using separate ipvs.conf and it is running keepalived with --check parameter so that second keepalived instance just does IPVS director/availability checker. Now this second service is managed by notify script so that it is only running on master server.

But I find this setup unnecessarily complicated when keepalived could just ignore ipvs director on backup servers. if configured to do so.

Our setup is purely IPVS-DR. but IPVS method used is not relevant for this, relevant is that this way there is no need to have extra cluster for load balancer.

If there would be need for bigger number of nodes, we'd of course run dedicated load balancer, but these systems are usually micro service and main focus is to make these active-active services and not active-backup which would work just fine without IPVS.

@pqarmitage
Copy link
Collaborator

@bleve, many thanks for your response and apologies for the slow response. I now understand why and what you need.

This is actually rather difficult to implement, and it would require a significant change to the architecture of keepalived.

You refer to a master node and a backup node, but that is not a concept for keepalived. Suppose two nodes each have 2 VRRP instances; one node could be in master state for one instance and backup state for the other instance, and vice versa. I think the solution to this would be that the configuration would have to link a VRRP instance to a virtual server, so that the virtual server's configuration tracks the state of the VRRP instance.

The VRRP functionality and the IPVS functionality of keepalived run in separate processes, and currently there is no communication between them. We would need to add something like a bi-directional pipe between the two processes, so that the IPVS process could notify which VRRP instances it is interested in, and then the VRRP process would notify the IPVS process of any state changes of those VRRP instances.

Please could you let us know if the above ide would work for you, and if you are still interested in such a solution.

@bleve
Copy link
Author

bleve commented Mar 20, 2024

Sounds like a solution to me.

When you open this can of worms, it might be good idea to think if it would be good idea to add support for staring and stopping other services based on master/backup state.

Currently only way I (and others) have found to do this is to use notify scripts to handle those dependencies. But stop notify script is run too late which causes problems. Depending services should be stopped before ip addresses are removed.

I just remind about this other issue, because it can affect design.

@pqarmitage
Copy link
Collaborator

The stop notifies are actually sent before the VIPs are removed, but notify scripts are precisely that, they notify that something is happening, but they are asynchronous - keepalived continues processing after it has sent the notifies, and it doesn't check, or wait for, the termination of the notify scripts.

Consider the notify_master script. If keepalived waited for the termination of the notify_master script before becoming master and there was some delay in executing the script, a lower priority VRRP instance on another node might in the mean time become master. If VRRP is configured to run at its fastest rate, i.e. 1 centi-second interval between adverts, and two VRRP instances have a difference in priority of only 1, then the lower priority instance will wait only 39 micro-seconds longer than the higher priority instance before deciding to take over as master, so we absolutely cannot afford to allow any delay to be introduced.

I do think it is possible to argue that notify_stop could be treated differently from the other notifies, since a slight delay in stopping is not important.

A better (and faster) way than using notify scripts is to use the notify fifo, with a notify_fifo_script configured. This has the advantage of a guarantee that messages are delivered in sequence (with notify scripts it is possible for scripts to run out of sequence due to kernel scheduling) and also that it is faster since a new process doesn't have to be created to process the notification.

When you refer above to starting and stopping other services, are you referring to systemd type services? I so, could you add an After=keepalived.server dependency in the service files for the other services, or Before= dependencies in the keepalived.service file?

Regarding adding stopping/starting services for master/backup (and presumably fault too) state transitions, it has always been assumed that notify scripts would be used for this (the notify script could be systemctl stop SERVICE to avoid having a bash script as an intermediary. It absolutely will not be possible to execute anything (including service starting/stopping) synchronously before master/backup state transitions for the reasons outlined above.

Again, if the services you refer to starting and stopping are systemd services, then we could add the facility to do that directly from keepalived using the systemd D-Bus API, but I am not clear that this would be worth while when it could be done from a notify_fifo script (which doesn't have to be a script at all but can be a compiled C (or any other language) program, which in turn could call the systemd D-Bus API).

@pqarmitage
Copy link
Collaborator

I have worked out a way to implement what is requested here, but it just needs a bit of ensuring all the edge cases are properly handled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants