MDEV-33543 Server hang caused by InnoDB change buffer #3234
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Issue: When getting a page (buf_page_get_gen) with no latch option (RW_NO_LATCH), the caller is not expected to follow the B-tree latching order. However in buf_page_get_low we try to acquire shared page latch unconditionally to wait for a page that is being loaded by another thread concurrently. In general it could lead to latch order violation and deadlock.
Currently it affects the change buffer insert path btr_latch_prev() which tries to load the previous page out of order with RW_NO_LATCH and two concurrent inserts into IBUF tree cause deadlock. This problem is introduced in 10.6 by MDEV-27058.
Fix: While trying to latch a page with RW_NO_LATCH, always use the "*lock_try" interface and retry operation on failure after unfixing the page.
Release Notes
MDEV-27058 had introduced this regression in 10.6.6. which could hang the server under load If innodb_change_buffering is enabled and user has tables with non unique secondary index.
How can this PR be tested?
It is difficult to maintain an automated mtr test in our regular test suite. The RQG test and the attached code/test patch in MDEV can repeat the issue.
The MDEV has a repeatable test and steps for manual reproduction of the issue.
Basing the PR against the correct MariaDB version
PR quality check