MDEV-33465: an option to enable semisync recovery #3099

bnestere · 2024-03-04T18:19:43Z

The current semi-sync binlog fail-over recovery process uses rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a primary server’s binlog, as it is anticipating the server to re-join a replication topology as a replica. However, for servers configured with both rpl_semi_sync_master_enabled=1 and
rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e. retaining its role as master), it can truncate its binlog to drop transactions which its replica(s) has already received and executed. If this happens, when the replica reconnects, its gtid_slave_pos can be ahead of the recovered primary’s gtid_binlog_pos, resulting in an error state where the replica’s state is ahead of the primary’s.

This patch changes the condition for semi-sync recovery to truncate the binlog to instead use the configuration variable --init-rpl-role, when set to SLAVE. This allows for both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be set for a primary that is restarted, and no transactions will be lost, so long as --init-rpl-role is not set to SLAVE.

The current semi-sync binlog fail-over recovery process uses rpl_semi_sync_slave_enabled==TRUE as its condition to truncate a primary server’s binlog, as it is anticipating the server to re-join a replication topology as a replica. However, for servers configured with both rpl_semi_sync_master_enabled=1 and rpl_semi_sync_slave_enabled=1, if a primary is just re-started (i.e. retaining its role as master), it can truncate its binlog to drop transactions which its replica(s) has already received and executed. If this happens, when the replica reconnects, its gtid_slave_pos can be ahead of the recovered primary’s gtid_binlog_pos, resulting in an error state where the replica’s state is ahead of the primary’s. This patch changes the condition for semi-sync recovery to truncate the binlog to instead use the configuration variable --init-rpl-role, when set to SLAVE. This allows for both rpl_semi_sync_master_enabled and rpl_semi_sync_slave_enabled to be set for a primary that is restarted, and no transactions will be lost, so long as --init-rpl-role is not set to SLAVE. Reviewed By: ============ <TODO>

bnestere added the MariaDB Corporation label Mar 4, 2024

bnestere requested a review from andrelkin March 4, 2024 18:19

bnestere assigned andrelkin Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-33465: an option to enable semisync recovery #3099

MDEV-33465: an option to enable semisync recovery #3099

bnestere commented Mar 4, 2024

MDEV-33465: an option to enable semisync recovery #3099

Are you sure you want to change the base?

MDEV-33465: an option to enable semisync recovery #3099

Conversation

bnestere commented Mar 4, 2024