Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HighAvailablity how to use the client #552

Open
esseti opened this issue Jan 29, 2020 · 2 comments
Open

HighAvailablity how to use the client #552

esseti opened this issue Jan 29, 2020 · 2 comments

Comments

@esseti
Copy link

esseti commented Jan 29, 2020

Hi all,
I've setted up two vault machine (namely vault1 and vault2) for HA (fallback).

The clients allow only a single url to be put as client url. i've set vault1.mydomain.com

If the vault1 restarts then it becomes sealed and looses all the infos regarding the HA, that's to send the data to vault2 . At the same time, vault2 knows to be the active one. Bascially the HA does not work.

How can I let the code to switch to the master?
is there a way to put a list of url and let the system check what to use? does it automatically check the ha_status ? or should I implement a fallback when possible?

or there should be something configured somewhere else? (ex: have NGINX in front that does some checkings and stuff)

many thanks.

@esseti esseti changed the title HighAvailablity ho to use the client HighAvailablity how to use the client Jan 29, 2020
@esseti
Copy link
Author

esseti commented Jan 30, 2020

While waiting for some answers, i created some functions to find out the real master.

this checks that the url node is ok

def _node_ok(url):
    """ check status of the node, only 200 is fine """
    client = hvac.Client(url=url)
    try:
        res = client.sys.read_health_status()
        # 200 if initialized, unsealed, and active
        # 429 if unsealed and standby
        # 472 if disaster recovery mode replication secondary and active
        # 473 if performance standby
        # 501 if not initialized
        # 503 if sealed
        log.debug(f"{url} is {res.status_code}")
        return res.status_code == 200
    except:
        log.exception(f"Problem with {url}")
        return False

this finds the master among a list of URLS (here, specifically, i get the list of urls from djangoconf)

@retry_on_exception()
def find_master_url():
    """
    Find the master node

    loops the urls, until it finds a valid node.
    iterates for 5 times before giving up.

    Args:
        count: how many retries

    Returns: the url of the master

    """
    urls = settings.VAULT_ADDR
    # one urls there's no need to do anything
    if len(urls) == 1:
        return urls[0]

    client = hvac.Client()
    for url in urls:
        log.debug(f"Investigating {url}")
        client.url = url
        try:
            status = client.sys.read_leader_status()
            leader = status['leader_address']
            # now we check if the leader is fine,
            # it may happen that this check returns a wrong leader
            # since the other one has not yet update the info.
            # if the leader has problem we return the node that responded.
            # if this will have problem too, next time we should be able to find someone.
            if _node_ok(leader):
                log.info(f"Leader is {leader}")
                return leader
            elif _node_ok(url):
                log.info(f"Leader is the url {url}")
                return url
        except Exception:
            log.error(f"Failed to find leader for url {url}")
    raise Exception("Error with vault, no master found")

plus, I've created a wraper to retry the operation 5 times, since it may be that the system takes a littele wile to find out a leader.


def retry_on_exception(counter=5, exceptions=None, delay=1):
    """
    retries function on exceptions, return value, or last exception
    """
    if not exceptions:
        exceptions = (Exception,)
    elif is_iterable(exceptions):
        exceptions = tuple(exceptions)
    else:
        exceptions = (exceptions,)

    def wrapper(func):

        @wraps(func)
        def f_retry(*args, **kwargs):
            last = None
            tries = 0
            while tries < counter:
                try:
                    return func(*args, **kwargs)
                except exceptions as e:
                    sleep(delay)
                    tries += 1
                    log.info(f'Retrying {tries} {func.__name__}')
                    last = e
                except Exception as e:

                    raise e
            raise last

        return f_retry

    return wrapper

Note: i used the wrapper also in the functions where i call the enc/dec operations. that functions calls a function to get the global client get_global_client. the get_global_client performs a _node_ok(client.url) and if not it performs a find_master_url

not sure if it make sense to think to interoduce this in the libary as standard behaviour if there's more than one url specified. Or if there's a better/simpler way to handle this. I was quite surprised that the vault agent did not manage the HA properly in an automatic fashion.

@esseti
Copy link
Author

esseti commented Mar 23, 2020

This can be related if someone is searching for more infos https://groups.google.com/forum/#!searchin/vault-tool/HA|sort:date/vault-tool/rF-iKGa-QQk/Qt4D7yZQAgAJ

In the end I set the agent to poin to a HAproxy that redirects to the node that ha 200 in the health check.
the code here above works, but not 100% tested. use at your own risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant