Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10.11 innodb wsrep applier lock wait timeout #2351

Open
wants to merge 4 commits into
base: 10.11
Choose a base branch
from

Conversation

sjaakola
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-29684

Description

This patch implements wsrep applier lock wait timeout
functionality.

As transactions which are executed by appliers have passed
certification in the cluster, they must be applied and committed
successfully. However, occasionally BF aborting local transactions
may not work perfectly due to race conditions or unforeseen
behavior of the lock manager, which may cause appliers to wait
locks indefinitely. Especially if the local transaction has
already reached commit stage, it will not yield via lock
wait timeout.

In order to resolve indefinite applier waits, a short applier
lock wait timeout is introduced. However instead of giving up
with lock wait, a background thread is used to retry BF abort
on behalf of the applier which is waiting for the lock.

A variable to control the applier lock wait timeout is
innodb_wsrep_applier_lock_wait_timeout with default
value of five seconds. If the value is zero, the background
BF aborting is disabled.

The value of innodb_wsrep_applier_lock_wait_timeout is set
to zero in Galera suite MTR test configuration to avoid
non-deterministic behavior.

How can this PR be tested?

The PR contains a mtr test for testing the functionality

TODO: modify the automated test suite to verify that the PR causes MariaDB to
behave as intended. Consult the documentation on
"Writing good test cases".
In many cases, this will be as simple as modifying one .test and one .result
file in the mysql-test/ subdirectory. Without automated tests, future regressions
in the expected behavior can't be automatically detected and verified.

If the changes are not amenable to automated testing, please explain why not and
carefully describe how to test manually.

Basing the PR against the correct MariaDB version

  • [x ] This is a new feature and the PR is based against the latest MariaDB development branch
  • This is a bug fix and the PR is based against the earliest branch in which the bug can be reproduced

Backward compatibility

temeo and others added 3 commits September 13, 2022 17:02
This patch implements wsrep applier lock wait timeout
functionality.

As transactions which are executed by appliers have passed
certification in the cluster, they must be applied and committed
successfully. However, occasionally BF aborting local transactions
may not work perfectly due to race conditions or unforeseen
behavior of the lock manager, which may cause appliers to wait
locks indefinitely. Especially if the local transaction has
already reached commit stage, it will not yield via lock
wait timeout.

In order to resolve indefinite applier waits, a short applier
lock wait timeout is introduced. However instead of giving up
with lock wait, a background thread is used to retry BF abort
on behalf of the applier which is waiting for the lock.

A variable to control the applier lock wait timeout is
innodb_wsrep_applier_lock_wait_timeout with default
value of five seconds. If the value is zero, the background
BF aborting is disabled.

A separate timer wsrep_BF_watchdog_timer was added to achieve
one second resolution.

The value of innodb_wsrep_applier_lock_wait_timeout is set
to zero in Galera suite MTR test configuration to avoid
non-deterministic behavior.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ sjaakola
❌ temeo
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the build and test failures.

Because I am concerned about potential performance regressions, I think that this will require extensive performance testing for the case that Galera replication is disabled.

I am confused about the testing status. MDEV-29684 mentions version 10.4, yet this change is targeting 10.11.

{
std::vector<trx_t *, ut_allocator<trx_t *> > bf_waiters{};
ulonglong max_wait_time{0};
my_hrtime_t now{my_hrtime_coarse()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-constant initialization expression for an implicit default constructor is not allowed in the oldest compilers that we support (GCC 4.8.5):

/home/buildbot/amd64-centos-7-rpm-autobake/build/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/storage/innobase/lock/lock0lock.cc:6505:37: error: cannot convert 'my_hrtime_t' to 'ulonglong {aka long long unsigned int}' in initialization
   my_hrtime_t now{my_hrtime_coarse()};

applier thread is released.
*/

std::atomic<uint> wsrep_BF_waiting_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is not documented by any a comment. Why is it not declared static?

#ifdef WITH_WSREP
if (trx->is_wsrep() && wsrep_thd_is_BF(trx->mysql_thd, false))
{
++wsrep_BF_waiting_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this using std::memory_order_seq_cst?

#ifdef WITH_WSREP
if (trx->is_wsrep() && wsrep_thd_is_BF(trx->mysql_thd, false))
{
--wsrep_BF_waiting_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this using std::memory_order_seq_cst?

Comment on lines 6483 to 6485
/* Grab reference to prevent trx going out of scope before
it is explicitly released in wsrep_run_BF_lock_wait_watchdog(). */
trx->reference();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transaction object should remain valid as long as it is unable to release its locks. The lock_sys.wait_mutex that this thread is holding should block that.

auto elapsed= (my_hrtime_coarse().val - start_time.val) / HRTIME_RESOLUTION;
if (elapsed >= 1)
{
ib::warn() << "WSREP watchdog took " << elapsed << " seconds to execute";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoking sql_print_warning() directly would save at least 5 function calls.

Comment on lines 6514 to 6519
mysql_mutex_unlock(&lock_sys.wait_mutex);

for (auto *trx : waiters.bf_waiters)
{
lock_wait_wsrep(trx);
trx->release_reference();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can trx be committed or rolled back meanwhile? What would happen in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

victim could commit or rollback, but that is usual scenario as several applier threads may issue BF abort concurrently for the same local transaction in current implementation as well

srv_start_periodic_timer(wsrep_BF_watchdog_timer,
wsrep_BF_watchdog_task,
WSREP_BF_WATCHDOG_INTERVAL);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot the code be invoked from srv_master_callback()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could, but this lock wait watchdog thread is now configured with different periodic timer, with shorter interval

SELECT * FROM t1;

DROP TABLE t1;
CALL mtr.add_suppression("WSREP watchdog took");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline

@janlindstrom
Copy link
Contributor

This is not exactly related to MDEV-29684 that is a real bug on 10.4. This is more related to https://jira.mariadb.org/browse/MDEV-29496

@janlindstrom janlindstrom assigned sysprg and unassigned janlindstrom May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants