Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Galera feature: retry applying of write sets at slave nodes #387

Open
wants to merge 13 commits into
base: 11.4
Choose a base branch
from

Conversation

plampio
Copy link

@plampio plampio commented Jan 9, 2024

Description

Initial version of the feature that allows retrying of applying of write sets at slave nodes. If Galera Cluster fails to apply a write set at a slave node, replication fails and the slave node is dropped from the cluster. However, the apply failure might be only temporary; a second apply attempt might succeed. Therefore, Galera Cluster might sometimes recover from such a failure by retrying applying of write sets at slave nodes.

This patch is quite separate from the server code and should not introduce any side-effects in other parts of the server.

How can this PR be tested?

This patch introduces two new MTR tests: galera.galera_retry_applying and galera.galera_retry_applying2 for testing the feature. In the first test, a retry succeeds and in the second test all retry attempts fail.

The new MTR tests need some more work: the cleanup after the tests is not complete.

@plampio plampio requested review from sjaakola and temeo January 9, 2024 09:52
--connection node_2
SET GLOBAL wsrep_applier_retry_count = 2;
SET GLOBAL debug_dbug = "+d,innodb_insert_fail:o,/dev/null";
CALL mtr.add_suppression("WSREP: Event 3 Write_rows_v1 apply failed: 146, seqno 4");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not to include sequence numbers in suppressions, the seqno will not remain the same if the test is run repeatedly or part of full MTR run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will fix this as suggested.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed a suggested.

Copy link

@janlindstrom janlindstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but

is default 0 good choice? What are dangers to set it larger e.g. 5 ?

@@ -0,0 +1,40 @@
#

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common practice in mtr testing is to bundle several test cases in same test file, this should yield faster testing. As galera_retry_applying and galera_retry_applying2 quite similar test cases, please accommodate both in galera_retry_applying.test and remove the galera_retry_applying2.test and result as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

SET GLOBAL debug_dbug = "+d,innodb_insert_fail:o,/dev/null";

--connection node_1
SLEEP 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be no reason for sleep here, please remove it.
It is possible to add wait_condition for connection node_2 to wait until table t2 has been replicated there, but as we have wsrep_sync_wait=15 by default, there should be no absolute reason to wait for the replication

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

} else if (0 == strcmp("test/t2", node->table->name.m_name)) {
err = DB_LOCK_WAIT_TIMEOUT;
goto error_handling;}});

err = row_ins(node, thr);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using hard-coded table names here for the failure injection works, but is not elegant.
We have recently created similar failure injection for FK failure cases, where the failure injection happens for a given number of times, and execution succeeds after last failure. Please take a look at commit ea08242 and DBUG_EXECUTE_IF implementation in storage/innobase/row/row0ins.cc. Same approach should work here.
The FK failure injection has also hard coded part to fail exactly 4 times, I assume this could be parametrized and enable the mtr script to decide how many failures are wanted (if we want to take this to the next level)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed one of the two hard-coded table names and replaced the other hard-coded name with the option variable wsrep_innodb_insert_fail_table.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now failure injection is effective only for table t1. However, the mtr test phase 2 expects insert failure for table t2, so the test phase 2 is not effective atm

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this problem.

Copy link

@sjaakola sjaakola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtr test needs more attention

} else if (0 == strcmp("test/t2", node->table->name.m_name)) {
err = DB_LOCK_WAIT_TIMEOUT;
goto error_handling;}});

err = row_ins(node, thr);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now failure injection is effective only for table t1. However, the mtr test phase 2 expects insert failure for table t2, so the test phase 2 is not effective atm

--echo Shutting down server ...
SET wsrep_on=OFF;
--source include/shutdown_mysqld.inc
--remove_file $MYSQLTEST_VARDIR/mysqld.2/data/grastate.dat
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a race condition: previously a transaction was committed in node_1 and here node_2 will shutdown. But there is no check for the fate of the replicated INSERT from node_1: it may still be replicating or is currently applying or has already committed in node_2. For deterministic test behavior, the state of the INSERT transaction should be synced here or documented if it does not matter for the test result and can be safely ignored

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, as this test phase is supposed to cause sure applier failure, the node should crash, so no need to shutdown it anymore.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed table names in row0ins.c, but DBUG_EXECUTE_IF macros still
mention the test database test. There are now two separate DBUG_EXECUTE_IF labels:
innodb_insert_fail_once - for failing INSERT once inside InnoDB, and
innodb_insert_fail_always - for failing INSERT always inside InnoDB.

Synchronization.
There are now 4 synchronization points in the MTR test, two in each of
the 2 test cases:

  1. Test case 1: wait till the insert transaction has been replicated and committed in node_2 (line 25)

  2. Test case 1: wait till the transaction has been replicated and committed in node_2 (line 44)

  3. Test case 2: wait for node_2 to crash (line 54)

  4. Test case 2: wait till node 2 is back in the cluster (line 120)

The test does not work without shutting down the server on node 2 after applier failure.


/* rollback to savepoint without telling Wsrep-lib */
bool saved_wsrep_on = thd->variables.wsrep_on;
thd->variables.wsrep_on = false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this thread is replication applier, so it must have wsrep_on == ON, therefore using saved_wsrep_on is redundant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as suggested.

sql/sys_vars.cc Outdated
VALID_RANGE(0, UINT_MAX), DEFAULT(0), BLOCK_SIZE(1),
NO_MUTEX_GUARD, NOT_IN_BINLOG);

#if defined(ENABLED_DEBUG_SYNC)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new system variable just for test fault injection does not sound like acceptable. This change will change the result set for all MTR tests which list system variable, and the change is depends on build configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change that introduced a new system variable. So, now the table names for which inserts fail inside InnoDB are once again hard-coded in the InnoDB code.

Is there any other way (besides a system variable) to pass table names from MTR tests to InnoDB code at run-time?
If not, then hard-coding table names in InnoDB code can't be avoided for the MTR test galera.galera_retry_applying.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a reference, take a look at another PR: #385
which causes failure injection for given number of times. This is for all tables, but could be usable in your test as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible to follow the method displayed in PR 385, but I think the way failure injection is now implemented inside InnoDB for INSERTs is actually simpler.

Copy link

@temeo temeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new system variable just for test fault injection does not sound like acceptable. This change will change the result set for all MTR tests which list system variable, and the change is depends on build configuration.

Copy link

@sjaakola sjaakola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is good, branch needs rebase though

Remove debug messages.
If the variable "wsrep_retry_applying" is nonzero, then a temporary
savepoint "wsrep-retry" is created for each apply of a writeset.
Merged "galera.galera_retry_applying2" into "galera.galera_retry_applying"
and remove MTR test "galera.galera_retry_applying2".
Improved MTR test "galera.galera_retry_applying".
1) The retry applying feature did not work for streaming replication (SR)
   transactions	at slave nodes. The MTR tests

      galera_sr.galera_sr_log_bin
      galera_sr.galera_sr_gtid
      galera_sr.mysql-wsrep-features-136
      galera.mdev_18730

   failed because of this. This	problem	was fixed by disabling retry
   applying feature for	SR transactions.

2) Some	transactions (such as DDL statemens) automatically cleared
   savepoints causing retry applying fail at slave nodes. As a result
   of this the following MTR tests showed error voting behavior changes

       galera.mysql-wsrep-216
       galera.galera_vote_ddl
       galera_3nodes.galera_toi_vote
       galera_3nodes.galera_vote_rejoin_mysqldump

    The	problem	was fixed by checking after every apply attempt that
    the savepoint still exists.
moving failure inject from InnoDB code to MySQL code.
@plampio
Copy link
Author

plampio commented Nov 8, 2024

Rebased the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants