Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-33133: MDL conflict handling code should skip BF-aborted trxs #3010

Open
wants to merge 1 commit into
base: 10.6
Choose a base branch
from

Conversation

denis-protivensky
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-33133

Description

It's possible that MDL conflict handling code is called more than once for a transaction when:

  • it holds more than one conflicting MDL lock
  • reschedule_waiters() is executed, which results in repeated attempts to BF-abort already aborted transaction.
    In such situations, it might be that BF-aborting logic sees a partially rolled back transaction and erroneously decides on future actions for such a transaction.

The specific situation tested and fixed is when a SR transaction applied in the node gets BF-aborted by a started TOI operation. It's then caught with the server transaction already rolled back, but with no MDL locks yet released. This caused wrong state detection for such a transaction during repeated MDL conflict handling code execution.

How can this PR be tested?

MTR test is provided.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch.
  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

It's possible that MDL conflict handling code is called more
than once for a transaction when:
- it holds more than one conflicting MDL lock
- reschedule_waiters() is executed,
which results in repeated attempts to BF-abort already aborted
transaction.
In such situations, it might be that BF-aborting logic sees
a partially rolled back transaction and erroneously decides
on future actions for such a transaction.

The specific situation tested and fixed is when a SR transaction
applied in the node gets BF-aborted by a started TOI operation.
It's then caught with the server transaction already rolled back,
but with no MDL locks yet released. This caused wrong state
detection for such a transaction during repeated MDL conflict
handling code execution.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@janlindstrom janlindstrom added the Codership Codership Galera label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Codership Codership Galera
4 participants