Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MDEV-34009] Introduce server-initiated instant failover mechanism #3231

Open
wants to merge 5 commits into
base: 11.5
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Next Next commit
[MDEV-34009] Basic framework for instant failover mechanism
This adds the system variables `INSTANT_FAILOVER_TARGET` and
`INSTANT_FAILOVER_MODE` (`OFF`/`ON`/`ALL`), and the error code
`ER_INSTANT_FAILOVER`.

When `INSTANT_FAILOVER_MODE=ON`, the server will immediately respond
to newly-connected clients with an error packet containing the error code
4196 (`ER_INSTANT_FAILOVER`) and a specially-formatted message:

    |Arbitary human-readable message|value of INSTANT_FAILOVER_TARGET

For example:

    |Server is directing clients to the alternative server 'other-mariadb-server.company.com:3307'.|other-mariadb-server.company.com:3307

Updated and compatible clients can parse this message and redirect
appropriately, or display the human-readable message if they do not wish to
follow this redirection.  Older clients will display the message in its
entirety, and end users should at least have an idea of what's going on.

In my earliest implementation, the sending of the `ER_INSTANT_FAILOVER`
error packet by the MariaDB server depended on the exploitation of the
client vulnerability https://jira.mariadb.org/browse/CONC-648 (“Client
improperly accepts error packets prior to TLS handshake”), which I
discovered during its implementation.

The server should obviously be able to redirect clients which don't suffer
from this severe vulnerability.

In order to do this, we need to move the redirection handling into
`native_password_authentication()`.  This awkward arrangement is
necessitated by the total entanglement of the APPLICATION-layer
authentication code (e.g. username+password for "native" authentication)
with the TRANSPORT-layer security mechanism (TLS literally stands for
Transport Layer Security).

An unfortunate consequence of this is that the redirection mechanism will
not work unless the client is using the "native" authentication plugin, or
until other authentication plugins are similarly updated similarly.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
  • Loading branch information
dlenski committed Apr 29, 2024
commit 46a511a59da543f634c424374dca38e597f4e090
6 changes: 6 additions & 0 deletions sql/mysqld.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1526,6 +1526,12 @@ static Atomic_counter<uint> extra_connection_count;

my_bool opt_gtid_strict_mode= FALSE;

/**
Instant failover
*/

const char *instant_failover_target = NullS;
ulong instant_failover_mode= INSTANT_FAILOVER_MODE_OFF;

/* Function declarations */

Expand Down
8 changes: 8 additions & 0 deletions sql/mysqld.h
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,14 @@ extern ulong opt_binlog_dbug_fsync_sleep;
extern uint volatile global_disable_checkpoint;
extern my_bool opt_help;

extern const char *instant_failover_target;
enum enum_instant_failover_mode {
INSTANT_FAILOVER_MODE_OFF = 0,
INSTANT_FAILOVER_MODE_ON = 1,
INSTANT_FAILOVER_MODE_ALL = 2
};
extern ulong instant_failover_mode;

extern int mysqld_main(int argc, char **argv);

#ifdef _WIN32
Expand Down
4 changes: 4 additions & 0 deletions sql/share/errmsg-utf8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12280,3 +12280,7 @@ ER_SEQUENCE_TABLE_CANNOT_HAVE_ANY_CONSTRAINTS
eng "Sequence tables cannot have any constraints"
ER_SEQUENCE_TABLE_ORDER_BY
eng "ORDER BY"
ER_INSTANT_FAILOVER
eng "|Server is directing clients to the alternative server '%1$s'|%1$s"
fra "|Ce serveur dirige ses clients vers le serveur alternatif '%1$s'|%1$s"
spa "|Este servidor está dirigiendo sus clientes al servidor alternativo '%1$s'|%1$s"
38 changes: 38 additions & 0 deletions sql/sql_acl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13980,6 +13980,44 @@ static ulong parse_client_handshake_packet(MPVIO_EXT *mpvio,
return packet_error;
}

/* If the server is configured to redirect clients to another server (pre-authentication),
* then send an error packet to signal that here.
*
* FIXME: it makes absolutely no sense for this pre-authentication redirection mechanism
* to be invoked HERE, in the middle of the authentication process. Unfortunately, the
* existing code structure deeply entangles two logically separate concerns:
*
* 1) Encrypting and authenticating the client->server connection at the TRANSPORT layer
* using TLS/SSL (TLS stands for TRANSPORT-layer security).
* 2) Negotiating an appropriate APPLICATION-layer authentication mode, and then
* authenticating the client.
*
* It would be much more logical, simple, and universal to do this redirection right at
* the beginning of sql_authenticate(), but -- because of the above entangling -- the
* transport layer encryption has not yet been enabled at that point.
*
* This means that a client which expects a secured transport SHOULD NOT trust any
* redirection message (or any other error message) which it receives prior to the
* the TLS handshake; existing clients DO present such errors as trustworthy, which is
* a security vulnerability that needs a separate fix (see more details at
* https://jira.mariadb.org/browse/CONC-648).
*/
enum enum_vio_type type= vio_type(thd->net.vio);
bool local_connection= (type == VIO_TYPE_SOCKET) || (type == VIO_TYPE_NAMEDPIPE);

if (instant_failover_mode == INSTANT_FAILOVER_MODE_ALL ||
(instant_failover_mode == INSTANT_FAILOVER_MODE_ON && !local_connection))
{
sql_print_warning("Redirecting connection %lld via %s to INSTANT_FAILOVER_TARGET=%s (INSTANT_FAILOVER_MODE=%s)",
thd->thread_id, safe_vio_type_name(thd->net.vio), instant_failover_target,
(instant_failover_mode == INSTANT_FAILOVER_MODE_ON ? "ON" : "ALL"));
DBUG_PRINT("info", ("redirecting connection %lld via %s to INSTANT_FAILOVER_TARGET=%s (INSTANT_FAILOVER_MODE=%s)",
thd->thread_id, safe_vio_type_name(thd->net.vio), instant_failover_target,
(instant_failover_mode == INSTANT_FAILOVER_MODE_ON ? "ON" : "ALL")));
my_error(ER_INSTANT_FAILOVER, MYF(0), instant_failover_target);
DBUG_RETURN(packet_error);
}

if (client_capabilities & CLIENT_PROTOCOL_41)
{
thd->max_client_packet_length= uint4korr(net->read_pos+4);
Expand Down
20 changes: 20 additions & 0 deletions sql/sys_vars.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7328,3 +7328,23 @@ static Sys_var_enum Sys_block_encryption_mode(
"AES_ENCRYPT() and AES_DECRYPT() functions",
SESSION_VAR(block_encryption_mode), CMD_LINE(REQUIRED_ARG),
block_encryption_mode_values, DEFAULT(0));

static Sys_var_charptr_fscs Sys_instant_failover_target(
"instant_failover_target",
"Instant failover target. This should be a hostname, an IP address, or "
"a hostname or IP address followed by ':PORT'. Instant failover will "
"not be activated unless INSTANT_FAILOVER_MODE is also set.",
GLOBAL_VAR(instant_failover_target), CMD_LINE(REQUIRED_ARG),
DEFAULT(NULL));

static const char *instant_failover_mode_names[]= {
"OFF", "ON", "ALL", 0 };

static Sys_var_enum Sys_instant_failover_mode(
"instant_failover_mode",
"Instant failover mode. "
"Possible modes are: OFF - No instant failover, "
"ON: Unconditionally redirect new clients connecting over the network to INSTANT_FAILOVER_TARGET (no redirection of local socket-based connections), "
"ALL: Unconditionally redirect all new clients to INSTANT_FAILOVER_TARGET (even via local socket-based connections).",
GLOBAL_VAR(instant_failover_mode), CMD_LINE(REQUIRED_ARG),
instant_failover_mode_names, DEFAULT(0));