-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10.11 innodb wsrep applier lock wait timeout #2351
base: 10.11
Are you sure you want to change the base?
10.11 innodb wsrep applier lock wait timeout #2351
Conversation
This patch implements wsrep applier lock wait timeout functionality. As transactions which are executed by appliers have passed certification in the cluster, they must be applied and committed successfully. However, occasionally BF aborting local transactions may not work perfectly due to race conditions or unforeseen behavior of the lock manager, which may cause appliers to wait locks indefinitely. Especially if the local transaction has already reached commit stage, it will not yield via lock wait timeout. In order to resolve indefinite applier waits, a short applier lock wait timeout is introduced. However instead of giving up with lock wait, a background thread is used to retry BF abort on behalf of the applier which is waiting for the lock. A variable to control the applier lock wait timeout is innodb_wsrep_applier_lock_wait_timeout with default value of five seconds. If the value is zero, the background BF aborting is disabled. A separate timer wsrep_BF_watchdog_timer was added to achieve one second resolution. The value of innodb_wsrep_applier_lock_wait_timeout is set to zero in Galera suite MTR test configuration to avoid non-deterministic behavior.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the build and test failures.
Because I am concerned about potential performance regressions, I think that this will require extensive performance testing for the case that Galera replication is disabled.
I am confused about the testing status. MDEV-29684 mentions version 10.4, yet this change is targeting 10.11.
storage/innobase/lock/lock0lock.cc
Outdated
{ | ||
std::vector<trx_t *, ut_allocator<trx_t *> > bf_waiters{}; | ||
ulonglong max_wait_time{0}; | ||
my_hrtime_t now{my_hrtime_coarse()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A non-constant initialization expression for an implicit default constructor is not allowed in the oldest compilers that we support (GCC 4.8.5):
/home/buildbot/amd64-centos-7-rpm-autobake/build/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/storage/innobase/lock/lock0lock.cc:6505:37: error: cannot convert 'my_hrtime_t' to 'ulonglong {aka long long unsigned int}' in initialization
my_hrtime_t now{my_hrtime_coarse()};
storage/innobase/lock/lock0lock.cc
Outdated
applier thread is released. | ||
*/ | ||
|
||
std::atomic<uint> wsrep_BF_waiting_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is not documented by any a comment. Why is it not declared static
?
storage/innobase/lock/lock0lock.cc
Outdated
#ifdef WITH_WSREP | ||
if (trx->is_wsrep() && wsrep_thd_is_BF(trx->mysql_thd, false)) | ||
{ | ||
++wsrep_BF_waiting_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this using std::memory_order_seq_cst
?
storage/innobase/lock/lock0lock.cc
Outdated
#ifdef WITH_WSREP | ||
if (trx->is_wsrep() && wsrep_thd_is_BF(trx->mysql_thd, false)) | ||
{ | ||
--wsrep_BF_waiting_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this using std::memory_order_seq_cst
?
storage/innobase/lock/lock0lock.cc
Outdated
/* Grab reference to prevent trx going out of scope before | ||
it is explicitly released in wsrep_run_BF_lock_wait_watchdog(). */ | ||
trx->reference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transaction object should remain valid as long as it is unable to release its locks. The lock_sys.wait_mutex
that this thread is holding should block that.
storage/innobase/lock/lock0lock.cc
Outdated
auto elapsed= (my_hrtime_coarse().val - start_time.val) / HRTIME_RESOLUTION; | ||
if (elapsed >= 1) | ||
{ | ||
ib::warn() << "WSREP watchdog took " << elapsed << " seconds to execute"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoking sql_print_warning()
directly would save at least 5 function calls.
storage/innobase/lock/lock0lock.cc
Outdated
mysql_mutex_unlock(&lock_sys.wait_mutex); | ||
|
||
for (auto *trx : waiters.bf_waiters) | ||
{ | ||
lock_wait_wsrep(trx); | ||
trx->release_reference(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can trx
be committed or rolled back meanwhile? What would happen in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
victim could commit or rollback, but that is usual scenario as several applier threads may issue BF abort concurrently for the same local transaction in current implementation as well
srv_start_periodic_timer(wsrep_BF_watchdog_timer, | ||
wsrep_BF_watchdog_task, | ||
WSREP_BF_WATCHDOG_INTERVAL); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cannot the code be invoked from srv_master_callback()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could, but this lock wait watchdog thread is now configured with different periodic timer, with shorter interval
SELECT * FROM t1; | ||
|
||
DROP TABLE t1; | ||
CALL mtr.add_suppression("WSREP watchdog took"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
This is not exactly related to MDEV-29684 that is a real bug on 10.4. This is more related to https://jira.mariadb.org/browse/MDEV-29496 |
Description
This patch implements wsrep applier lock wait timeout
functionality.
As transactions which are executed by appliers have passed
certification in the cluster, they must be applied and committed
successfully. However, occasionally BF aborting local transactions
may not work perfectly due to race conditions or unforeseen
behavior of the lock manager, which may cause appliers to wait
locks indefinitely. Especially if the local transaction has
already reached commit stage, it will not yield via lock
wait timeout.
In order to resolve indefinite applier waits, a short applier
lock wait timeout is introduced. However instead of giving up
with lock wait, a background thread is used to retry BF abort
on behalf of the applier which is waiting for the lock.
A variable to control the applier lock wait timeout is
innodb_wsrep_applier_lock_wait_timeout with default
value of five seconds. If the value is zero, the background
BF aborting is disabled.
The value of innodb_wsrep_applier_lock_wait_timeout is set
to zero in Galera suite MTR test configuration to avoid
non-deterministic behavior.
How can this PR be tested?
The PR contains a mtr test for testing the functionality
TODO: modify the automated test suite to verify that the PR causes MariaDB to
behave as intended. Consult the documentation on
"Writing good test cases".
In many cases, this will be as simple as modifying one
.test
and one.result
file in the
mysql-test/
subdirectory. Without automated tests, future regressionsin the expected behavior can't be automatically detected and verified.
If the changes are not amenable to automated testing, please explain why not and
carefully describe how to test manually.
Basing the PR against the correct MariaDB version
Backward compatibility