-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MDEV-32172] Expose log file names and positions in PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS #2763
base: 11.4
Are you sure you want to change the base?
Conversation
|
182ff21
to
d438371
Compare
d438371
to
beac6be
Compare
@LinuxJedi mentioned in our meeting today that he'll add this to his to-review list :-) (And just rebased onto 11.4) |
5fc0382
to
4072575
Compare
re-record perfschema.table_schema test results. |
4072575
to
7d79b71
Compare
…A.REPLICATION_APPLIER_STATUS Historically, the only command available for showing complete information on replication status in MySQL and MariaDB has been the `SHOW REPLICA STATUS` command (formerly known as `SHOW SLAVE STATUS`) [^1][^2]. This command has a major downside: its output cannot be captured and used in a SQL `SELECT` query, subquery, or SQL stored routine. All of the following result in syntax errors: - `SHOW REPLICA STATUS LIKE 'Last_Errno'` - `SHOW SLAVE STATUS INTO Last_Errno, Master_Log_File` - `SELECT Last_Errno, Master_Log_File FROM (SHOW SLAVE STATUS)` - `SET @array = (SHOW SLAVE STATUS)` The frequently-cited workaround is to execute the `SHOW SLAVE STATUS` command from some external program via a connector/driver; at that layer, the output of `SHOW REPLICA STATUS` _can_ be read and manipulated in a way that's equivalent to the output of a `SELECT` query. However, because of this gap between the SQL syntax and the information sought, it was impossible to write a SQL stored routine which introspects the replication status of the MySQL/MariaDB server on which it's running. Numerous blog posts and StackOverflow questions over the course of decades attest to frustration and confusion over this situation, e.g.: - "Can we capture only Slave_IO_Running in SHOW SLAVE STATUS in MySQL" (2011) https://dba.stackexchange.com/questions/12554 - "What is the SELECT statement equivalent of SHOW ALL SLAVES STATUS?" (2021) (https://dba.stackexchange.com/questions/287263) Starting in MySQL 5.7 and MariaDB 10.5, **some of** the information conveyed in the output of `SHOW SLAVE STATUS` became available via `SELECT` queries on the `PERFORMANCE_SCHEMA.REPLICATION*` views [^3]. However, even **with** the addition of the relevant performance-schema views, not all of the information available in the output of `SHOW SLAVE STATUS` is available via normal `SELECT` queries. The MariaDB documentation does not explicitly note this, but the MySQL documentation does [^4]: > several SHOW SLAVE STATUS columns are not preserved in the > Performance Schema replication tables: > > [list of such columns] A 2018 blog post by @wisborg helpfully shows the gaps in the form of a table mapping `SHOW SLAVE STATUS` output fields to performance-schema tables and columns [^5]. The goal of this patch is to specifically address two of those gaps, by taking the columns `Master_Log_File` and `Until_Log_File` (strings) and `Read_Master_Log_Pos` and `Until_Log_Pos` (non-negative integers) from the output of `SHOW SLAVE STATUS` and exposing them via select queries on `PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS`. Among other things, this makes it possible for an SQL stored routine to verify if replication has stopped_specifically because_ it has run past the "until point" specified via `START SLAVE … UNTIL`. Without this addition, it appears to be impossible for pure-SQL server-side code to distinguish this case from other reasons that replication might have stopped. Example: MariaDB [(none)]> change master 'channelX' to master_host='dbhost', master_port=3306, master_log_file='mariadb.00001', master_log_pos=1234; Query OK, 0 rows affected (0.071 sec) MariaDB [(none)]> start slave 'channelX' until master_log_file='mariadb.00003', master_log_pos=5678; Query OK, 0 rows affected, 2 warnings (0.001 sec) MariaDB [(none)]> select * from performance_schema.replication_applier_status; +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+ | CHANNEL_NAME | SERVICE_STATE | REMAINING_DELAY | COUNT_TRANSACTIONS_RETRIES | LOG_FILE | LOG_FILE_POS | UNTIL_LOG_FILE | UNTIL_LOG_FILE_POS | +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+ | channelX | ON | NULL | 0 | mariadb.00001 | 1234 | foobar.00003 | 5678 | +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+ 1 row in set (0.001 sec) [^1]: https://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html [^2]: https://mariadb.com/kb/en/show-replica-status [^3]: https://mariadb.com/kb/en/list-of-performance-schema-tables [^4]: https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html#idm46228724639264 [^5]: https://mysql.wisborg.dk/2018/10/05/replication-monitoring-with-the-performance-schema/#old-versus-new All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services
7d79b71
to
022c0f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me, I defer to @andrelkin for final opinion.
Thanks, @LinuxJedi! @andrelkin, as I describe in the PR description, the goal here is to close a gap between…
This is a very longstanding pain-point, as demonstrated by the blog posts and StackExchange questions I linked to. The fields that are added here are among the "lowest hanging fruit" for closing this gap. Once this is merged, I will tackle some of the additional fields in the gap. |
Salve @dlenski! Thanks for helping us in this area and not least of a comprehensive survey that motivates the work! Let me provide first some notes on overall design and suggests at once something to improve in the patch. It feels good about the PS-based solution, even though it of course would require to run the server with PS-enabled. As a possible alternative not mentioned, the SSS (Show-Slave-Status) extending with more fields combined with https://mariadb.com/kb/en/extended-show/. To the placement of these SSS fields, it looks to me As to I think it would be very good to add a replication test to your branch that would also compare the added to PS table fields with their SSS values. With that I am switching to the patch itself for leaving only a couple of small notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first glace the patch looks good.
A missed replication test is good have.
@@ -60,6 +60,12 @@ struct st_row_applier_status { | |||
uint remaining_delay; | |||
bool remaining_delay_is_set; | |||
ulong count_transactions_retries; | |||
char log_file[FN_REFLEN]; /* TODO verify size */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the TODO going to address? As the file name is copied from a source it'd be enough to declare the array by the source's size 🤔 ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the file name is copied from a source it'd be enough to declare the array by the source's size 🤔 ..
Right, and according to the source of SHOW {REPLICA,SLAVE} STATUS
, FN_REFLEN
is the correct source-size.
What is the TODO going to address?
My main concern is whether there's an off-by-1 error here.
In ha_perfschema.cc a filename-sized buffer is sized FN_REFLEN+1
whereas in pfs_instr.cc it's exactly FN_REFLEN
.
In general, the size of static buffers holding filenames is extremely inconsistent throughout the codebase.
@andrelkin wrote:
Yes, I have a lot of thoughts on this 😅 In my opinion, it's a major design flaw that these critical system views for replication are relegated to the optional Note that MySQL 8.0 fudges this by making some of the
tl;dr My preference here would be to merge this PR with the existing limitation (PS must be enabled in order to access the |
@andrelkin wrote:
"Extended show" allows filtering the results of a But it does not provide any way to capture the results of a It therefore does not address the key problem that I'm trying to deal with here.
Hmmm… this feels like it would be at best a small micro-optimization, given that querying replication status is unlikely to be an operation performed frequently in a fast-path. It's more of a "management" operation.
Are you saying that I should move these columns (
Okay. When you say "star[t]ing and ending GTIDs" do you mean the current GTID and the until-GTID (if applicable), or something else? Current GTID from Lines 3348 to 3355 in eeba940
Until-GTID from Lines 2477 to 2511 in eeba940
I don't understand what you mean here.
Absolutely. I will add tests verifying that SSS matches the results from these views once the location(s) of the columns are nailed down. 😅 |
Just a brief reply at this point also to address a question,
Unfortunate 'heavy' from my side. The mutexes may not be hot at all, but as they are shared with the slave threads it
Indeed. I was thinking to generalize the notion of In your case the above two pairs describe its "current" and the ultimate ends of the(relay-log) buffer respectively.
SSS::Gtid_IO_Pos
I started explaining in above. The coordinator table concerns may/should include the buffered queue head and end.
Thanks for being ready for that!*think |
Description
Background (from MDEV-32172)
Historically, the only command available for showing complete information on replication status in MySQL and MariaDB has been the
SHOW REPLICA STATUS
command (formerly known asSHOW SLAVE STATUS
) 12.This command has a major downside: its output cannot be captured and used in a SQL
SELECT
query, subquery, or SQL stored routine. All of the following result in syntax errors:SHOW REPLICA STATUS LIKE 'Last_Errno'
SHOW SLAVE STATUS INTO Last_Errno, Master_Log_File
SELECT Last_Errno, Master_Log_File FROM (SHOW SLAVE STATUS)
SET @array = (SHOW SLAVE STATUS)
The frequently-cited workaround is to execute the
SHOW SLAVE STATUS
command from some external program via a connector/driver; at that layer, the output ofSHOW REPLICA STATUS
can be read and manipulated in a way that's equivalent to the output of aSELECT
query.However, because of this gap between the SQL syntax and the information sought, it was basically impossible to write a SQL stored routine which introspects the replication status of the MySQL/MariaDB server on which it's running.
Numerous blog posts and StackOverflow questions over the course of decades attest to frustration and confusion over this situation, e.g.:
https://dba.stackexchange.com/questions/12554
(https://dba.stackexchange.com/questions/287263)
Starting in MySQL 5.7 and MariaDB 10.5, some of the information conveyed in the output of
SHOW SLAVE STATUS
became available viaSELECT
queries on thePERFORMANCE_SCHEMA.REPLICATION*
views 3. However, even with the addition of the relevant performance-schema views, not all of the information available in the output ofSHOW SLAVE STATUS
is available via normalSELECT
queries.The MariaDB documentation does not explicitly note this, but the MySQL documentation does 4:
A 2018 blog post by @wisborg helpfully shows the gaps in the form of a table mapping
SHOW SLAVE STATUS
output fields to performance-schema tables and columns 5.Changes here
The goal of this PR is specifically to address two of those gaps, by taking the columns
Master_Log_File
andUntil_Log_File
(strings) andRead_Master_Log_Pos
andUntil_Log_Pos
(non-negative integers) from the output ofSHOW SLAVE STATUS
and exposing them via select queries onPERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS
. Among other things, this makes it possible for an SQL stored routine to verify if replication has stopped_specifically because_ it has run past the "until point" specified viaSTART SLAVE … UNTIL
. Without this addition, it appears to be impossible for pure-SQL server-side code to distinguish this case from other reasons that replication might have stopped.Example:
How can this PR be tested?
CHANGE MASTER TO
/START SLAVE [UNTIL]
commands result in expected fields being populated inPERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS
Basing the PR against the correct MariaDB version
This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.
The replication status information is critical for managing replication configuration. I argue that it's a very longstanding bug that it's absent in earlier versions of MariaDB, preventing the use of purely server-side logic (SQL stored routines) for management of replication configuration.
PR quality check
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc
Footnotes
https://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html ↩
https://mariadb.com/kb/en/show-replica-status ↩
https://mariadb.com/kb/en/list-of-performance-schema-tables ↩
https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html#idm46228724639264 ↩
https://mysql.wisborg.dk/2018/10/05/replication-monitoring-with-the-performance-schema/#old-versus-new ↩