Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MDEV-32172] Expose log file names and positions in PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS #2763

Open
wants to merge 1 commit into
base: 11.4
Choose a base branch
from

Conversation

dlenski
Copy link
Contributor

@dlenski dlenski commented Sep 18, 2023

  • The Jira issue number for this PR is: MDEV-32171

Description

Background (from MDEV-32172)

Historically, the only command available for showing complete information on replication status in MySQL and MariaDB has been the SHOW REPLICA STATUS command (formerly known as SHOW SLAVE STATUS) 12.

This command has a major downside: its output cannot be captured and used in a SQL SELECT query, subquery, or SQL stored routine. All of the following result in syntax errors:

  • SHOW REPLICA STATUS LIKE 'Last_Errno'
  • SHOW SLAVE STATUS INTO Last_Errno, Master_Log_File
  • SELECT Last_Errno, Master_Log_File FROM (SHOW SLAVE STATUS)
  • SET @array = (SHOW SLAVE STATUS)

The frequently-cited workaround is to execute the SHOW SLAVE STATUS command from some external program via a connector/driver; at that layer, the output of SHOW REPLICA STATUS can be read and manipulated in a way that's equivalent to the output of a SELECT query.

However, because of this gap between the SQL syntax and the information sought, it was basically impossible to write a SQL stored routine which introspects the replication status of the MySQL/MariaDB server on which it's running.

Numerous blog posts and StackOverflow questions over the course of decades attest to frustration and confusion over this situation, e.g.:

Starting in MySQL 5.7 and MariaDB 10.5, some of the information conveyed in the output of SHOW SLAVE STATUS became available via SELECT queries on the PERFORMANCE_SCHEMA.REPLICATION* views 3. However, even with the addition of the relevant performance-schema views, not all of the information available in the output of SHOW SLAVE STATUS is available via normal SELECT queries.

The MariaDB documentation does not explicitly note this, but the MySQL documentation does 4:

several SHOW SLAVE STATUS columns are not preserved in the Performance Schema replication tables:

[list of such columns]

A 2018 blog post by @wisborg helpfully shows the gaps in the form of a table mapping SHOW SLAVE STATUS output fields to performance-schema tables and columns 5.

Changes here

The goal of this PR is specifically to address two of those gaps, by taking the columns Master_Log_File and Until_Log_File (strings) and Read_Master_Log_Pos and Until_Log_Pos (non-negative integers) from the output of SHOW SLAVE STATUS and exposing them via select queries on PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS. Among other things, this makes it possible for an SQL stored routine to verify if replication has stopped_specifically because_ it has run past the "until point" specified via START SLAVE … UNTIL. Without this addition, it appears to be impossible for pure-SQL server-side code to distinguish this case from other reasons that replication might have stopped.

Example:

MariaDB [(none)]> change master 'channelX' to master_host='dbhost', master_port=3306, master_log_file='mariadb.00001', master_log_pos=1234;
Query OK, 0 rows affected (0.071 sec)

MariaDB [(none)]> start slave 'channelX' until master_log_file='mariadb.00003', master_log_pos=5678;
Query OK, 0 rows affected, 2 warnings (0.001 sec)

MariaDB [(none)]> select * from performance_schema.replication_applier_status;
+--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
| CHANNEL_NAME | SERVICE_STATE | REMAINING_DELAY | COUNT_TRANSACTIONS_RETRIES | LOG_FILE      | LOG_FILE_POS | UNTIL_LOG_FILE | UNTIL_LOG_FILE_POS |
+--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
| channelX     | ON            |            NULL |                          0 | mariadb.00001 |         1234 | foobar.00003   |               5678 |
+--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
1 row in set (0.001 sec)

How can this PR be tested?

  • Builds with no errors
  • Manually testing that CHANGE MASTER TO/START SLAVE [UNTIL] commands result in expected fields being populated in PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS
  • Coverage of newly-added functionality in MTR tests

Basing the PR against the correct MariaDB version

  • This is a bug fix and the PR is based against the earliest maintained branch in which the bug can be reproduced.

    The replication status information is critical for managing replication configuration. I argue that it's a very longstanding bug that it's absent in earlier versions of MariaDB, preventing the use of purely server-side logic (SQL stored routines) for management of replication configuration.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

Copyright

All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc

Footnotes

  1. https://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html

  2. https://mariadb.com/kb/en/show-replica-status

  3. https://mariadb.com/kb/en/list-of-performance-schema-tables

  4. https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html#idm46228724639264

  5. https://mysql.wisborg.dk/2018/10/05/replication-monitoring-with-the-performance-schema/#old-versus-new

@CLAassistant
Copy link

CLAassistant commented Sep 18, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dlenski
Copy link
Contributor Author

dlenski commented Feb 13, 2024

@LinuxJedi mentioned in our meeting today that he'll add this to his to-review list :-)

(And just rebased onto 11.4)

@dlenski dlenski force-pushed the add_replication_log_file_names_and_positions_to_PS branch 2 times, most recently from 5fc0382 to 4072575 Compare February 13, 2024 18:47
@grooverdan
Copy link
Member

re-record perfschema.table_schema test results.

@dlenski dlenski force-pushed the add_replication_log_file_names_and_positions_to_PS branch from 4072575 to 7d79b71 Compare February 14, 2024 17:37
…A.REPLICATION_APPLIER_STATUS

Historically, the only command available for showing complete
information on replication status in MySQL and MariaDB has been the
`SHOW REPLICA STATUS` command (formerly known as `SHOW SLAVE STATUS`) [^1][^2].

This command has a major downside: its output cannot be captured and
used in a SQL `SELECT` query, subquery, or SQL stored routine.
All of the following result in syntax errors:

- `SHOW REPLICA STATUS LIKE 'Last_Errno'`
- `SHOW SLAVE STATUS INTO Last_Errno, Master_Log_File`
- `SELECT Last_Errno, Master_Log_File FROM (SHOW SLAVE STATUS)`
- `SET @array = (SHOW SLAVE STATUS)`

The frequently-cited workaround is to execute the `SHOW SLAVE STATUS`
command from some external program via a connector/driver; at that
layer, the output of `SHOW REPLICA STATUS` _can_ be read and manipulated
in a way that's equivalent to the output of a `SELECT` query.

However, because of this gap between the SQL syntax and the information
sought, it was impossible to write a SQL stored routine which introspects
the replication status of the MySQL/MariaDB server on which it's running.

Numerous blog posts and StackOverflow questions over the course of
decades attest to frustration and confusion over this situation, e.g.:

- "Can we capture only Slave_IO_Running in SHOW SLAVE STATUS in MySQL" (2011)
  https://dba.stackexchange.com/questions/12554
- "What is the SELECT statement equivalent of SHOW ALL SLAVES STATUS?" (2021)
  (https://dba.stackexchange.com/questions/287263)

Starting in MySQL 5.7 and MariaDB 10.5, **some of** the information conveyed
in the output of `SHOW SLAVE STATUS` became available via `SELECT` queries
on the `PERFORMANCE_SCHEMA.REPLICATION*` views [^3].  However, even **with**
the addition of the relevant performance-schema views, not all of the
information available in the output of `SHOW SLAVE STATUS` is available via
normal `SELECT` queries.

The MariaDB documentation does not explicitly note this, but the MySQL
documentation does [^4]:

> several SHOW SLAVE STATUS columns are not preserved in the
> Performance Schema replication tables:
>
> [list of such columns]

A 2018 blog post by @wisborg helpfully shows the gaps in the form of a
table mapping `SHOW SLAVE STATUS` output fields to performance-schema
tables and columns [^5].

The goal of this patch is to specifically address two of those gaps,
by taking the columns `Master_Log_File` and `Until_Log_File` (strings)
and `Read_Master_Log_Pos` and `Until_Log_Pos` (non-negative integers)
from the output of `SHOW SLAVE STATUS` and exposing them via select
queries on `PERFORMANCE_SCHEMA.REPLICATION_APPLIER_STATUS`. Among
other things, this makes it possible for an SQL stored routine to
verify if replication has stopped_specifically because_ it has run
past the "until point" specified via `START SLAVE … UNTIL`. Without
this addition, it appears to be impossible for pure-SQL server-side
code to distinguish this case from other reasons that replication
might have stopped.

Example:

    MariaDB [(none)]> change master 'channelX' to master_host='dbhost', master_port=3306, master_log_file='mariadb.00001', master_log_pos=1234;
    Query OK, 0 rows affected (0.071 sec)

    MariaDB [(none)]> start slave 'channelX' until master_log_file='mariadb.00003', master_log_pos=5678;
    Query OK, 0 rows affected, 2 warnings (0.001 sec)

    MariaDB [(none)]> select * from performance_schema.replication_applier_status;
    +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
    | CHANNEL_NAME | SERVICE_STATE | REMAINING_DELAY | COUNT_TRANSACTIONS_RETRIES | LOG_FILE      | LOG_FILE_POS | UNTIL_LOG_FILE | UNTIL_LOG_FILE_POS |
    +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
    | channelX     | ON            |            NULL |                          0 | mariadb.00001 |         1234 | foobar.00003   |               5678 |
    +--------------+---------------+-----------------+----------------------------+---------------+--------------+----------------+--------------------+
    1 row in set (0.001 sec)

[^1]: https://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html
[^2]: https://mariadb.com/kb/en/show-replica-status
[^3]: https://mariadb.com/kb/en/list-of-performance-schema-tables
[^4]: https://dev.mysql.com/doc/refman/5.7/en/performance-schema-replication-tables.html#idm46228724639264
[^5]: https://mysql.wisborg.dk/2018/10/05/replication-monitoring-with-the-performance-schema/#old-versus-new

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the BSD-new
license. I am contributing on behalf of my employer Amazon Web Services
@dlenski dlenski force-pushed the add_replication_log_file_names_and_positions_to_PS branch from 7d79b71 to 022c0f9 Compare February 14, 2024 19:20
Copy link
Contributor

@LinuxJedi LinuxJedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me, I defer to @andrelkin for final opinion.

@dlenski
Copy link
Contributor Author

dlenski commented Feb 16, 2024

All looks good to me, I defer to @andrelkin for final opinion.

Thanks, @LinuxJedi!

@andrelkin, as I describe in the PR description, the goal here is to close a gap between…

  • The information that's available in the output of SHOW REPLICA STATUS … which cannot be captured in a stored routine.
  • The information that's available by performing SELECT queries on PERFORMANCE_SCHEMA.REPLICATION* tables.

This is a very longstanding pain-point, as demonstrated by the blog posts and StackExchange questions I linked to.

The fields that are added here are among the "lowest hanging fruit" for closing this gap. Once this is merged, I will tackle some of the additional fields in the gap.

@andrelkin
Copy link
Contributor

andrelkin commented Feb 20, 2024

Salve @dlenski!

Thanks for helping us in this area and not least of a comprehensive survey that motivates the work!

Let me provide first some notes on overall design and suggests at once something to improve in the patch.
Detailed code review I can't make today but will do my best not lag behind on this for too long.

It feels good about the PS-based solution, even though it of course would require to run the server with PS-enabled.
Any thought on this part from you?

As a possible alternative not mentioned, the SSS (Show-Slave-Status) extending with more fields combined with https://mariadb.com/kb/en/extended-show/.
While referring myself I don't really like it, because the SSS as the base of such "view" is too heavy in computation sense.
But should the SSS be made lighter to not acquire up to four mutexes this could be a viable approach.

To the placement of these SSS fields, it looks to me
replication_applier_status_by_coordinator is a better one for LOG_FILE and LOG_FILE_POS with the meaning of
the end of buffered event queue (and then it would be consistent to also similarly reflect the start of queue - that is for a next event to schedule for execution).
For completeness I'd prefer the staring and ending GTIDs too.

As to replication_applier_status, it is better off to keep track the last executed event's binlog coordinates. Other PS replication tables do not carry that (only the last executed GTIDs are present in replication_applier_status_by_worker).

I think it would be very good to add a replication test to your branch that would also compare the added to PS table fields with their SSS values.

With that I am switching to the patch itself for leaving only a couple of small notes.

Copy link
Contributor

@andrelkin andrelkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first glace the patch looks good.
A missed replication test is good have.

@@ -60,6 +60,12 @@ struct st_row_applier_status {
uint remaining_delay;
bool remaining_delay_is_set;
ulong count_transactions_retries;
char log_file[FN_REFLEN]; /* TODO verify size */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the TODO going to address? As the file name is copied from a source it'd be enough to declare the array by the source's size 🤔 ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the file name is copied from a source it'd be enough to declare the array by the source's size 🤔 ..

Right, and according to the source of SHOW {REPLICA,SLAVE} STATUS, FN_REFLEN is the correct source-size.

What is the TODO going to address?

My main concern is whether there's an off-by-1 error here.

In ha_perfschema.cc a filename-sized buffer is sized FN_REFLEN+1 whereas in pfs_instr.cc it's exactly FN_REFLEN.

In general, the size of static buffers holding filenames is extremely inconsistent throughout the codebase.

@dlenski
Copy link
Contributor Author

dlenski commented Feb 20, 2024

@andrelkin wrote:

It feels good about the PS-based solution, even though it of course would require to run the server with PS-enabled.
Any thought on this part from you?

Yes, I have a lot of thoughts on this 😅

In my opinion, it's a major design flaw that these critical system views for replication are relegated to the optional PERFORMANCE_SCHEMA… but it's also a very longstanding one.

Note that MySQL 8.0 fudges this by making some of the PERFORMANCE_SCHEMA views available even when @@global.performance_schema=0.


tl;dr My preference here would be to merge this PR with the existing limitation (PS must be enabled in order to access the p_s.replication* views) and then to do a follow-up PR to enable the replication-related PS views even when PERFORMANCE_SCHEMA is disabled, just as MySQL 8.0+ already does.

@dlenski
Copy link
Contributor Author

dlenski commented Feb 20, 2024

@andrelkin wrote:

As a possible alternative not mentioned, the SSS (Show-Slave-Status) extending with more fields combined with https://mariadb.com/kb/en/extended-show/.

"Extended show" allows filtering the results of a SHOW xyz command, by adding WHERE/LIKE.

But it does not provide any way to capture the results of a SHOW xyz command. You can't do SHOW xyz INTO var1, var2, or otherwise use SHOW xyz as part of a SELECT query.

It therefore does not address the key problem that I'm trying to deal with here.

While referring myself I don't really like it, because the SSS as the base of such "view" is too heavy in computation sense.
But should the SSS be made lighter to not acquire up to four mutexes this could be a viable approach.

Hmmm… this feels like it would be at best a small micro-optimization, given that querying replication status is unlikely to be an operation performed frequently in a fast-path. It's more of a "management" operation.

it looks to me replication_applier_status_by_coordinator is a better one for LOG_FILE and LOG_FILE_POS with the meaning of
the end of buffered event queue (and then it would be consistent to also similarly reflect the start of queue - that is for a next event to schedule for execution).

Are you saying that I should move these columns (LOG_FILE, LOG_FILE_POS, UNTIL_LOG_FILE, UNTIL_LOG_FILE_POS) so that they're in replication_applier_status_by_coordinator instead of replication_applier_status?

For completeness I'd prefer the staring and ending GTIDs too.

Okay.

When you say "star[t]ing and ending GTIDs" do you mean the current GTID and the until-GTID (if applicable), or something else?

Current GTID from sql/slave.cc:

server/sql/slave.cc

Lines 3348 to 3355 in eeba940

// Using_Gtid
protocol->store_string_or_null(mi->using_gtid_astext(mi->using_gtid),
&my_charset_bin);
// Gtid_IO_Pos
{
mi->gtid_current_pos.to_string(&tmp);
protocol->store(tmp.ptr(), tmp.length(), &my_charset_bin);
}

Until-GTID from sql/slave.cc:

server/sql/slave.cc

Lines 2477 to 2511 in eeba940

if (mi->rli.until_condition == Relay_log_info::UNTIL_GTID)
{
query_str.length(0);
query_str.append(STRING_WITH_LEN("SET @slave_until_gtid='"),
system_charset_info);
if (mi->rli.until_gtid_pos.append_to_string(&query_str))
{
err_code= ER_OUTOFMEMORY;
errmsg= "The slave I/O thread stops because a fatal out-of-memory "
"error is encountered when it tries to compute @slave_until_gtid.";
sprintf(err_buff, "%s Error: Out of memory", errmsg);
goto err;
}
query_str.append(STRING_WITH_LEN("'"), system_charset_info);
rc= mysql_real_query(mysql, query_str.ptr(), query_str.length());
if (unlikely(rc))
{
err_code= mysql_errno(mysql);
if (is_network_error(err_code))
{
mi->report(ERROR_LEVEL, err_code, NULL,
"Setting @slave_until_gtid failed with error: %s",
mysql_error(mysql));
goto network_err;
}
else
{
/* Fatal error */
errmsg= "The slave I/O thread stops because a fatal error is "
"encountered when it tries to set @slave_until_gtid.";
sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
goto err;
}
}

As to replication_applier_status, it is better off to keep track the last executed event's binlog coordinates. Other PS replication tables do not carry that (only the last executed GTIDs are present in replication_applier_status_by_worker).

I don't understand what you mean here.

I think it would be very good to add a replication test to your branch that would also compare the added to PS table fields with their SSS values.

Absolutely. I will add tests verifying that SSS matches the results from these views once the location(s) of the columns are nailed down. 😅

@andrelkin
Copy link
Contributor

Just a brief reply at this point also to address a question,

@andrelkin wrote:

While referring myself I don't really like it, because the SSS as the base of such "view" is too heavy in computation sense.
But should the SSS be made lighter to not acquire up to four mutexes this could be a viable approach.

Hmmm… this feels like it would be at best a small micro-optimization, given that querying replication status is unlikely to be an operation performed frequently in a fast-path. It's more of a "management" operation.

Unfortunate 'heavy' from my side. The mutexes may not be hot at all, but as they are shared with the slave threads it
makes overall concurrent execution more vulnerable - e.g deadlock prone (bugs existed) - than necessary.

it looks to me replication_applier_status_by_coordinator is a better one for LOG_FILE and LOG_FILE_POS with the meaning of
the end of buffered event queue (and then it would be consistent to also similarly reflect the start of queue - that is for a next event to schedule for execution).

Are you saying that I should move these columns (LOG_FILE, LOG_FILE_POS, UNTIL_LOG_FILE, UNTIL_LOG_FILE_POS) so that they're in replication_applier_status_by_coordinator instead of replication_applier_status?

Indeed. I was thinking to generalize the notion of coordinator to contrast it with applier role, disregarding the parallel mode, or rather regarding the legacy sequential execution as the zero workers parallel one (^ @bnestere).
Coordinator then should have knowledge about buffered events whereas Applier view is about the executed ones,
that is SSS::Relay_Master_Log_File:Exec_Master_Log_Pos (file:pos), (SSS := Show-Slave-Status "view", as above).

In your case the above two pairs describe its "current" and the ultimate ends of the(relay-log) buffer respectively.
How much this concept makes sense in your opinion?

For completeness I'd prefer the staring and ending GTIDs too.

Okay.

When you say "star[t]ing and ending GTIDs" do you mean the current GTID and the until-GTID (if applicable), or something else?

SSS::Gtid_IO_Pos
and non-existing in SSS gtid equivalent to Until_Log_File:Until_Log_Pos.

Current GTID from sql/slave.cc:

server/sql/slave.cc

Lines 3348 to 3355 in eeba940

// Using_Gtid
protocol->store_string_or_null(mi->using_gtid_astext(mi->using_gtid),
&my_charset_bin);
// Gtid_IO_Pos
{
mi->gtid_current_pos.to_string(&tmp);
protocol->store(tmp.ptr(), tmp.length(), &my_charset_bin);
}

Until-GTID from sql/slave.cc:

server/sql/slave.cc

Lines 2477 to 2511 in eeba940

if (mi->rli.until_condition == Relay_log_info::UNTIL_GTID)
{
query_str.length(0);
query_str.append(STRING_WITH_LEN("SET @slave_until_gtid='"),
system_charset_info);
if (mi->rli.until_gtid_pos.append_to_string(&query_str))
{
err_code= ER_OUTOFMEMORY;
errmsg= "The slave I/O thread stops because a fatal out-of-memory "
"error is encountered when it tries to compute @slave_until_gtid.";
sprintf(err_buff, "%s Error: Out of memory", errmsg);
goto err;
}
query_str.append(STRING_WITH_LEN("'"), system_charset_info);
rc= mysql_real_query(mysql, query_str.ptr(), query_str.length());
if (unlikely(rc))
{
err_code= mysql_errno(mysql);
if (is_network_error(err_code))
{
mi->report(ERROR_LEVEL, err_code, NULL,
"Setting @slave_until_gtid failed with error: %s",
mysql_error(mysql));
goto network_err;
}
else
{
/* Fatal error */
errmsg= "The slave I/O thread stops because a fatal error is "
"encountered when it tries to set @slave_until_gtid.";
sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
goto err;
}
}

As to replication_applier_status, it is better off to keep track the last executed event's binlog coordinates. Other PS replication tables do not carry that (only the last executed GTIDs are present in replication_applier_status_by_worker).

I don't understand what you mean here.

I started explaining in above. The coordinator table concerns may/should include the buffered queue head and end.
Look at replication_applier_status_by_coordinator.LAST_SEEN_TRANSACTION that describes the end exclusively (being sent into execution by a worker).
The applier concerns may include the latter (not bad to let intersecting of two view at this field I think) and what is last executed Relay_Master_Log_File:Exec_Master_Log_Pos / @@global.gtid_slave_pos or replication_applier_status_by_worker.LAST_SEEN_TRANSACTION.
Currently replication_applier_status does not show either of the first two.

I think it would be very good to add a replication test to your branch that would also compare the added to PS table fields with their SSS values.

Absolutely. I will add tests verifying that SSS matches the results from these views once the location(s) of the columns are nailed down. 😅

Thanks for being ready for that!*think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants