MDEV-28452 wsrep_ready: OFF after MDL BF-BF conflict #426

plampio · 2024-05-13T08:37:09Z

Galera Cluster failed when two transactions that were executed
serially on a master node were incorrectly executed in parallel on a
slave node. The slave node detected MDL BF-BF conflict and quit the
cluster.

The problem was caused by OPTIMIZE TABLE on a table that is the child
table of a foreign key constraint. If UPDATE was performed on the
parent table of the foreign key constraint while OPTIMIZE TABLE was
running on the child table, the master node would run the transactions
serially, but a slave node might run them in parallel resulting in MDL
BF-BF conflict. The problem is fixed by adding foreign key constraint
to the replicated write set preventing parallel apply of the
transactions on the slave node. The code fixing this issue for
OPTIMIZE TABLE is similar to the code for ALTER TABLE.

Galera Cluster failed when two transactions that were executed serially on a master node were incorrectly executed in parallel on a slave node. The slave node detected MDL BF-BF conflict and quit the cluster. The problem was caused by OPTIMIZE TABLE on a table that is the child table of a foreign key constraint. If UPDATE was performed on the parent table of the foreign key constraint while OPTIMIZE TABLE was running on the child table, the master node would run the transactions serially, but a slave node might run them in parallel resulting in MDL BF-BF conflict. The problem is fixed by adding foreign key constraint to the replicated write set preventing parallel apply of the transactions on the slave node.

Removed debug messages.

janlindstrom

Test case needs work to remove unnecessary sleeps by replacing them with wait_condition or debug_sync WAIT_FOR/SIGNAL controlling.

janlindstrom · 2024-05-13T10:45:12Z

mysql-test/suite/galera/t/MDEV-28452.test

+# 1) OPTIMIZE TABLE on the child table of the foreign key constraint,
+--connect node_1a, 127.0.0.1, root, , test, $NODE_MYPORT_1
+--connection node_1a
+--sleep 5


Do you really need this sleep? I suggest using wait_condition instead.

You are right, this sleep is not needed.
Removed the sleep.

janlindstrom · 2024-05-13T10:45:59Z

mysql-test/suite/galera/t/MDEV-28452.test

+
+# create two tables
+CREATE TABLE `user` (
+ `id` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,


Is these table column names directly from customer and do you really need all columns?

These column names are indeed copied exactly from the customer case.
I have now removed all those columns that are not part of the keys and, thus, do not affect the test in any way.
But it would be possible to create a minimal test case with table names t1 and t2 and with just two columns per table.

janlindstrom · 2024-05-13T10:48:08Z

mysql-test/suite/galera/t/MDEV-28452.test

+
+# allow the stopped OPTIMIZE TABLE transaction to proceed
+--connection node_2a
+--sleep 5


These sleeps make test slow, there is mechanism to wait until we have reached debug sync by sending messages and then using WAIT_FOR e.g see test MW-369.inc

I have not been able to replace this sleep with other synchronization mechanism, such as WAIT_FOR.
But I commented out the sleep and the test seems to work without it.

janlindstrom · 2024-05-13T10:48:56Z

mysql-test/suite/galera/t/MDEV-28452.test

+
+# cleanup
+--connection node_1
+--sleep 5


Please remove all sleeps and rely on either wait_condition (using processlist) or debug_sync WAIT_FOR/SIGNAL

I have now removed two of the three sleeps in the MTR test.
Removing the last sleep is more tricky.

janlindstrom · 2024-05-13T10:50:50Z

sql/mdl.cc

+ DBUG_EXECUTE_IF("sync.mdev_28452",
+ {
+ const char act[]=
+ "now "


Here you could signal that we have reached this code by now signal mdev_28452_reached and you could wait that on test case.

sciascid · 2024-05-13T10:56:37Z

sql/sql_admin.cc

+ }
+ }
+ }
+#else
 WSREP_TO_ISOLATION_BEGIN_WRTCHK(NULL, NULL, first_table);


The line WSREP_TO_ISOLATION_BEGIN_WRTCHK() should be removed. It is replaced by your code above.

Yes, you are right.
Fixed as suggested.

sciascid · 2024-05-13T11:05:40Z

sql/sql_admin.cc

+ there. Variables are reset back in THD::reset_for_next_command() before
+ processing of next command.
+ */
+ if (wsrep_auto_increment_control)


I don't understand why we need to change auto_increment variables here.
What happens if you don't?

I tested also without changing auto_increment_* variables and the MTR tests succeeded.
I don't understand what these variable do, but the code segment was copied from
Sql_cmd_alter_table::execute() (in sql/sql_admin.cc) and this fix is very similar to that.

Optimize table could do table re-create, but I do not know if that has any effect on autoinc values (could be verified using show create table x; in both nodes).

plampio added 2 commits May 6, 2024 17:28

Code cleanup.

d456e13

Removed debug messages.

plampio requested review from sciascid and temeo May 13, 2024 08:37

janlindstrom requested changes May 13, 2024

View reviewed changes

sciascid requested changes May 13, 2024

View reviewed changes

sciascid reviewed May 13, 2024

View reviewed changes

plampio added 2 commits June 3, 2024 10:39

Fixes for issues raised by the reviewers.

4e785bd

Improved MTR test MDEV-28452 by removing a sleep.

827d90d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDEV-28452 wsrep_ready: OFF after MDL BF-BF conflict #426

MDEV-28452 wsrep_ready: OFF after MDL BF-BF conflict #426

plampio commented May 13, 2024

janlindstrom left a comment

janlindstrom May 13, 2024

plampio May 31, 2024

janlindstrom May 13, 2024

plampio May 31, 2024

janlindstrom May 13, 2024

plampio Jun 10, 2024

janlindstrom May 13, 2024

plampio Jun 3, 2024

janlindstrom May 13, 2024

sciascid May 13, 2024

plampio May 31, 2024

sciascid May 13, 2024

plampio May 31, 2024

janlindstrom May 31, 2024

MDEV-28452 wsrep_ready: OFF after MDL BF-BF conflict #426

Are you sure you want to change the base?

MDEV-28452 wsrep_ready: OFF after MDL BF-BF conflict #426

Conversation

plampio commented May 13, 2024

janlindstrom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment