-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-21322: report slave progress to the master #2374
base: 10.11
Are you sure you want to change the base?
Conversation
9f4e1ab
to
d23fbd6
Compare
SHOW REPLICA HOSTS; | ||
Server_id Host Port Master_id Semisync File Position Gtid | ||
3 slave2 SLAVE_PORT2 1 OFF 0 | ||
3 slave2 SLAVE_PORT2 1 OFF master-bin.000001 1018 0-1-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. Perhaps a bug (although not related to the task at-hand). @andrelkin if a slave re-connects while a master is awaiting a semi-sync ACK, show slave hosts
will report the same slave multiple times, until the timeout finishes and the thread finishes up. Should we add logic (in a different ticket) on slave connect, to check for any existing connections with that server, and clean them up if they exist?
|
||
connection master; | ||
set global rpl_semi_sync_master_enabled = 0; | ||
set global rpl_semi_sync_master_timeout= default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to restore the server to its state at the start of the test, rather than hard-coded 0 and default (similar to $save_wait_point
).
d23fbd6
to
ca263db
Compare
c762bbc
to
0619127
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Anel! Thanks for the patch. I left a few notes, overall it is coming along nicely :)
0619127
to
e01713c
Compare
Hi @bnestere I have created first part of your review (wip patch)- busy replica and there are some questions/problems with tests. Tests are simulating busy replica and I have 2 problems (as stated in commit message, please see the results there).
|
stop slave; | ||
reset slave; | ||
set global rpl_semi_sync_slave_enabled = 1; | ||
start slave; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have more consistent macros for the stop
, reset
, and start
commands which are safer for testing as
--source include/stop_slave.inc
--source include/reset_slave.inc
--source include/start_slave.inc
where the stop
and start
scrips wait for the slave threads to actually start or stop; and the reset
one will also reset the GTID state (otherwise transactions with lesser GTIDs won't actually be re-run)
--echo # Should be reset somehow for this particular slave (server_id=3) and wait that yes_tx be 2 ? Or other apporach? | ||
--echo # Note that primary doesn't report not acked transaction by replicas. | ||
--echo # Update: Looking into documentation - "semisync replication waits only on 1 replica to respond" -> | ||
--echo # So according to documentation it is ok, but in later example you will see that both replica have sent ACK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually because the master is re-sending old transactions. Only new transactions on the master side will be ACK
ed by the replica. See logic pertaining to k_packet_flag_sync
for more details. This is a good test to keep to prove that is the case, can you re-word the test description to reflect that behavior? Then the next test case does ACK
a new transaction, which is good.
INSERT INTO t VALUES (5); | ||
--echo # Question 2: Is this the problem ? We have sent the event to replica that | ||
--echo # has corrupted connection. I understand that we expect to have result SENT, and not ACK, | ||
--echo # but since we are iterating through slave_info, this slave is not found,why?In previous test slave_info is found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question.. I think I led you slightly astray with corrupt_queue_event
, as it forces the IO thread to error, eventually disconnecting from the master. Then once a dump thread disconnects on the master, it will call into unregister_slave()
, thereby removing the slave_info
.
I think instead it would make more sense to use dbug_sync
points, as you can manually wait/progress the IO threads queueing to simulate its slowness (and report show slave hosts
specifically while the IO thread is in a WAIT_FOR
).
sql/semisync_master.cc
Outdated
@@ -679,6 +672,11 @@ int Repl_semi_sync_master::report_reply_binlog(uint32 server_id, | |||
m_wait_file_name_inited = false; | |||
} | |||
} | |||
if (rpl_semi_sync_master_enabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check is necessary, as the function preamble already checks get_master_enabled()
and exits if not.
e01713c
to
538f2f6
Compare
# --echo Write new event IO thread is running | ||
INSERT INTO t VALUES (505); | ||
|
||
# Should `yes_tx` be incremented, I guess not, since it is async |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it should not when timeout=0
show variables like 'rpl_semi_sync_slave_enabled'; | ||
Variable_name Value | ||
rpl_semi_sync_slave_enabled OFF | ||
# Q2: Here we don't get any row for server_id=2, but we should? we get only `Gtid_State_Sent= Gtid_State_Ack=0-1-5` for server_id=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, as when the IO thread stopped on the replica, it shut down its connection to the primary. When the primary sees the connection is taken down, it deletes the thd->slave_info
variable.
START SLAVE IO_THREAD; | ||
include/wait_for_slave_io_to_start.inc | ||
# Q*: Stoping and starting the new IO thread will restart slave sent ack count, see below. | ||
# So we basically want this to be recorded in edge case, right ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, as these are slave-side variables, and the slave has no consideration of the master's semisync timeout value
# Q*: Stoping and starting the new IO thread will restart slave sent ack count, see below. | ||
# So we basically want this to be recorded in edge case, right ? | ||
# And that is rpl_semi_sync_master_timeout=0 and rpl_semi_sync_master_enabled=TRUE | ||
# Can we use this as patch instead of injecting the DBUG_SYNC? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; the purpose of the DEBUG_SYNC point is to delay the slave's ACK
reply after receiving the events. So the IO thread needs to be active. The process of the DEBUG_SYNC check would then resemble the following;
- Execute a query on the primary (there should be no wait on the connection)
- The primary sends the corresponding event group to the replica
- IO thread receives the corresponding event
- IO thread stops on a
DEBUG_SYNC now WAIT_FOR...
- We check
SHOW REPLICA HOSTS
on the primary and ensure thatGtid_state_sent
>Gtid_state_ack
- We signal the IO thread to continue
- Wait for the
Rpl_semi_sync_master_get_ack
status variable to increment - Check
SHOW REPLICA HOSTS
to ensureGtid_state_sent
now matchesGtid_state_ack
# So we basically want this to be recorded in edge case, right ? | ||
# And that is rpl_semi_sync_master_timeout=0 and rpl_semi_sync_master_enabled=TRUE | ||
# Can we use this as patch instead of injecting the DBUG_SYNC? | ||
# We are expecting here `Rpl_semi_sync_slave_send_ack == 4` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, as the replica will only send ACK
replies for the active transaction on the primary. Because we have 2 replicas here, and one is running, it is able to ACK
the transaction, so when the other starts up, the flag requiring an ACK
is never set, and so the replica never replies (hence why Rpl_semi_sync_slave_send_ack
is always 0
).
I imagine that if this was the only replica, when the primary sends 0-1-6
, it would set the flag that the event needs to be ACK
ed. Then the replica will ACK
only on the last transaction, and Rpl_semi_sync_slave_send_ack
should eventually be 1.
Variable_name Value | ||
Rpl_semi_sync_master_clients 2 | ||
# Q3: here we get `Gtid_State_Sent= 0-1-5 != Gtid_State_Ack=''` (but Gtid_State_Ack should be 0-1-4 right?) | ||
# So should we get for server_id=2 `Gtid_State_Sent= 0-1-6 != Gtid_State_Ack=0-1-4` ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your question ties into the DEBUG_SYNC answer. By keeping the IO thread on, and using DEBUG_SYNC to only pausing it (thereby maintaining the binlog_dump_thread connection), this would be the case when the slave just had not yet replied with the ACK
(but again, only on the active transaction).
2 localhost SLAVE_PORT 1 0-1-6 | ||
3 slave2 SLAVE_PORT 1 0-1-6 0-1-6 | ||
connection slave; | ||
# Q4: here we get sync good with server_id=3, `Gtid_State_Sent= Gtid_State_Ack=0-1-6` for server_id=2? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, as this now deals with the active transaction logic.. Because there were no new transactions, this would present as the last transaction that the replica had sent an ACK
for. I see that being potentially misleading to users when the replica does catch up, yet presents an old Gtid_state_ack
. I see two options, either document the behavior (my preference), or we'd add additional if (!timeout)
behavior to sometimes allow for "special" ACK
s to non-active transactions.. @andrelkin what do you think?
show status like 'Rpl_semi_sync_master_yes_tx'; | ||
Variable_name Value | ||
Rpl_semi_sync_master_yes_tx 4 | ||
# Q1: Here we need to get the `Gtid_State_Sent=0-1-5 != Gtid_State_Ack=0-1-4` for server_id=2 OR MAYBE THIS IS GOOD, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the IO thread ON (and no reason to pause/error) then the replica will ACK
for the latest transaction, so both .._sent
and .._ack
would be 0-1-5
. So the current result seems right to me.
dac9166
to
5f0eab5
Compare
AFTER_COMMIT | ||
reset master; | ||
# Test 1: Primary has not enabled semisync | ||
# Note: even if not enabled semisync 2 new columns should be visible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than describing as "2 new columns" and "should return 6 columns", I'd be better to be specific as to exactly which columns you are expecting. The numbers don't mean much after the patch has been pushed, and 6 can change later
2 localhost SLAVE_PORT 1 0-1-2 0-1-2 | ||
# Test 5: Enable semi-sync for slave2 | ||
# Note 1: Replica (server_id=3) should be started and semi-sync enabled | ||
# Note 2: In semy-sync replication ACK thread (on master), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo of semy-sync
Rpl_semi_sync_master_clients 1 | ||
SHOW REPLICA HOSTS; | ||
Server_id Host Port Master_id Gtid_State_Sent Gtid_State_Ack | ||
2 localhost SLAVE_PORT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems time-dependent, where sometimes Gtid_State_Sent
could have 0-1-5
, and sometimes not, so I would suggest only keeping the SHOW REPLICA HOSTS
after syncing the replica with the master
connection slave2; | ||
--let $save_server_3_dbug= `SELECT @@GLOBAL.DEBUG_DBUG` | ||
SET @@GLOBAL.DEBUG_DBUG="d,synchronize_semisync_slave_reply"; | ||
--source include/start_slave.inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd probably here want
SET debug_sync='now WAIT_FOR at_slave_reply';
like you have for the slave
to prove the following claim of both replicas being stopped
5b8d725
to
e963de5
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
e963de5
to
0ee671a
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
0ee671a
to
cf65491
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
cf65491
to
7cdbbbb
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
7cdbbbb
to
50c8d40
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
50c8d40
to
6b8b018
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
6b8b018
to
e74d798
Compare
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes #2374, #1427 Reviewer:<[email protected]> Thanks <[email protected]> for reviewing the failure and suppresions.
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
e74d798
to
0a8ca59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Anel,
Thanks for rebasing the patch. I gave it another pass-through with a few comments. There are also a few un-resolved old comments that are still relevant.
show variables like 'rpl_semi_sync_slave_status'; | ||
--echo # slave_send_ack should be reseted when replica is stopped, but it is not (TODO?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not on stop (so statistics are saved if the slave is stopped), it is reset when the slave io thread starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks have updated in new commit.
|
||
# Write the new event | ||
INSERT INTO t VALUES (10); | ||
|
||
let $status_var= Rpl_semi_sync_master_status; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't delete the expected behavior out. I suppose we can adapt the test to allow it so it passes (and the actual gtid_state_sent
/gtid_state_ack
is consistent); but we should file a new MDEV about the broken behavior, comment out (but leave) code that supports the expected behavior, and add a comment referencing the new MDEV to un-comment the "good" path with the fix. We should follow this pattern for all the current broken semi-sync behaviors too, e.g. that described in test case 6:
# It may happen that rpl_semi_sync_master_no_transactions increments for above
# query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree have added MDEV-33615
sql/semisync_master.h
Outdated
@@ -531,7 +563,7 @@ class Repl_semi_sync_master | |||
* Return: | |||
* 0: success; non-zero: error | |||
*/ | |||
int report_reply_binlog(uint32 server_id, | |||
int report_reply_binlog(THD *replica_thd, uint32 server_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor comments here:
- We don't need the whole
THD
; but we can pass thethd->slave_info
because that is all that is needed. - We can remove the
server_id
parameter; because it can be taken fromthd->slave_info.server_id
.
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
e13ace8
to
d7afc1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Anel!
There are a few comments that are still relevant.
+ We are trying to extend the command `SHOW REPLICA HOSTS` that is executed on master, with columns `GTID_state_sent` and `GTID_state_ack`. In order to achieve that we have to extend `thd->slave_info` struct with 2 new entries. This struct needs to be populated and must be accessable to primary. + The first column `GTID_state_sent`: - It is populated by the last event that primary has sent to replica (only for semi-sync slave connection, but without need to know that replica actually obtained request, like it is case in asynchronous replication). - Primary executes `mysql_binlog` that first starts the `binlog_dump` thread. That thread starts `ack_receiver` thread, that returns second column of interest (in first iteration we don't care about). After `binlog_dump` thread creating, file is sent with `send_one_binlog_file()` function. This function is sending event with `send_events()` function, during which `send_event_to_slave()` function is executed. On that place we are creating the `thd->slave_info->gtid_state_sent` struct. + The second column `GTID_state_ack`: - It is populated by the `reply_packet_binlog` (a.k.a. `ack`) that is called, only for semi-sync replication. It is called on 2 places: a) during `binlog_dump` creation, after thread is added to `ack_reciver` and after thread is added as semi-sync slave. This should be first ack received from replica to primary. b) constantly in running phase of `ack_receiver` thread `run()`. - A the end of function `reply_packet_binlog` we are creating the struct `GTID_state_ack` + The third column `Sync_Status` - It should be filled with values `asynchronous`,`semi-sync active`, `semi-sync stale` in order to express the state of replica. - `semi-sync [active/stale]` related states depend on the value of `need_sync` that is updated on master and will be `semi-sync active` iff: - primary and replicas are configured to be in semi-sync mode and - if the event is a transaction's ending event. Otherwise state is `semi-sync stale`. - state is `asynchronous` if semi-sync is disabled on master as fast check and if there thd is not semi-sync thread. + Handling of `rpl_semi_sync_master_timeout= 0` means that primary status `rpl_semi_sync_master_status` will remain ON instead of switching off to async and incrementing the counter updates. + PR closes MariaDB#2374, MariaDB#1427 Reviewer:<[email protected]>
1. Delete redudant check of `rpl_semi_sync_master_enabled` 2. Move check of `m_wait_timeout` at the beginning 3. Pass by address `thd->slave_info` struct instead of `thd` to `report_reply_binlog()` and remove `server_id` argument that will be used from `thd->slave_info.server_id`
d7afc1e
to
e53ff4d
Compare
Thank you. Review is addressed with commit e53ff4d |
We are trying to extend the command
SHOW REPLICA HOSTS
that isexecuted on master, with columns
GTID_state_sent
andGTID_state_ack
.In order to achieve that we have to extend
thd->slave_info
struct with2 new entries. This struct needs to be populated and must be accessable to primary.
The first column
GTID_state_sent
:(only for semi-sync slave connection, but without need to know
that replica actually obtained request,
like it is case in asynchronous replication).
mysql_binlog
that first starts thebinlog_dump
thread.That thread starts
ack_receiver
thread, that returns second columnof interest (in first iteration we don't care about).
After
binlog_dump
thread creating, file is sent withsend_one_binlog_file()
function.This function is sending event with
send_events()
function,during which
send_event_to_slave()
function is executed.On that place we are creating the
thd->slave_info->gtid_state_sent
struct.The second column
GTID_state_ack
:reply_packet_binlog
(a.k.a.ack
) that is called,only for semi-sync replication. It is called on 2 places:
a) during
binlog_dump
creation, after thread is added toack_reciver
and after thread is added as semi-sync slave. Thisshould be first ack received from replica to primary.
b) constantly in running phase of
ack_receiver
threadrun()
.reply_packet_binlog
we are creating thestruct
GTID_state_ack