Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-29050: Fixing 'MDEV-29050: [ERROR] InnoDB: Operating system error number 2 in a file operation.' #2183

Open
wants to merge 4 commits into
base: 10.4
Choose a base branch
from

Conversation

thejaka-kanewala
Copy link

[x] *The Jira issue number for this PR is: MDEV-29050: [ERROR] InnoDB: Operating system error number 2 in a file operation.

Description

The Problem

When restoring a sparse backup (a backup executed with --tables or --tables-file) we execute
'mariabackup --prepare --export ...'. For certain restores we see a number of errors like the
following:
[ERROR] InnoDB: Cannot open datafile for read-only: './db/t1.ibd' OS error: 71

All the errors are relevant to tables that were not specified in the sparse backup.
For example, if we have two tables t1 and t2 and if we do,
'mariabackup --backup --tables=t2 ....' and to restore if we execute
'mariabackup --prepare --export ...', then, we get an error relevant to t1.
Also, note that the datafiles for unspecified (in the sparse backup) tables are not present in the backup.

Expected Behaviour

'mariabackup --prepare --export ...' should be executed without errors for sparse backups.

The Cause

During 'mariabackup --prepare --export', the recovery code inside
innodb goes through undo log records and attempts to resurrect
transactions. However, for backups, we do not need to recover transactions. For backups
it is sufficient to work with redo logs and data files. This logic is already implemented
for SRV_OPERATION_RESTORE but not implemented for SRV_OPERATION_RESTORE_EXPORT. Hence, SRV_OPERATION_RESTORE_EXPORT
attempts to recover pending transactions.

The Fix

Apply same logic to SRV_OPERATION_RESTORE_EXPORT as for SRV_OPERATION_RESTORE.

How can this PR be tested?

A mtr test case is added.

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch
  • [ x] This is a bug fix and the PR is based against the earliest branch in which the bug can be reproduced -- 10.4

You might consider answering some questions like:

  1. Does this affect the on-disk format used by MariaDB? -- No.
  2. Does this change any behavior experienced by a user
    who upgrades from a version prior to this patch? -- No.
  3. Would a user be able to start MariaDB on a datadir
    created prior to your fix? -- Yes.

…for read-only: './db/t1.ibd' OS error: 71'.

The Problem
===========
When restoring a sparse backup (a backup executed with --tables or --tables-file) we execute
'mariabackup --prepare --export ...'. For certain restores we see a number of errors like the
following:
[ERROR] InnoDB: Cannot open datafile for read-only: './db/t1.ibd' OS error: 71

All the errors are relevant to tables that were not specified in the sparse backup.
For example, if we have two tables t1 and t2 and if we do,
'mariabackup --backup --tables=t2 ....' and to restore if we execute
'mariabackup --prepare --export ...', then, we get an error relevant to t1.
Also, note that the datafiles for unspecified (in the sparse backup) tables are not present in the backup.

Expected Behaviour
==================
'mariabackup --prepare --export ...' should be executed without errors for sparse backups.

The Cause
=========
During 'mariabackup --prepare --export', the recovery code inside
innodb goes through undo log records and attempts to resurrect
transactions. However, for backups, we do not need to recover transactions. For backups
it is sufficient to work with redo logs and data files. This logic is already implemented
for SRV_OPERATION_RESTORE but not implemented for SRV_OPERATION_RESTORE_EXPORT. Hence, SRV_OPERATION_RESTORE_EXPORT
attempts to recover pending transactions.

The Fix
=======
Apply same logic to SRV_OPERATION_RESTORE_EXPORT as for SRV_OPERATION_RESTORE.
Copy link
Member

@grooverdan grooverdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. This seems applicable to 10.3, our earliest maintained version also right?


ALTER TABLE t1 IMPORT TABLESPACE;
echo "# Select all should include only 10";
SELECT * FROM t1;
Copy link
Member

@grooverdan grooverdan Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for test case repeatability, put ORDER BY a on statements like this to ensure the output result order doesn't change.
Edit: obviously more applicable to statements that return more than 1 result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -491,7 +491,8 @@ static dberr_t trx_rseg_mem_restore(trx_rseg_t *rseg, trx_id_t &max_trx_id,
}
}

if (srv_operation == SRV_OPERATION_RESTORE) {
if ((srv_operation == SRV_OPERATION_RESTORE) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the correction be added to is_mariabackup_restore_or_export() (storage/innobase/include/srv0srv.h) and for this, and the other case, be changed to use this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% certain whether we should make the change to is_mariabackup_restore_or_export(). Because is_mariabackup_restore_or_export checks whether server operation is SRV_OPERATION_RESTORE or SRV_OPERATION_RESTORE_ROLLBACK_XA or SRV_OPERATION_RESTORE_EXPORT (See [1]). I am not 100% certain whether we should return when server action is SRV_OPERATION_RESTORE_ROLLBACK_XA also.

[1]
inline bool is_mariabackup_restore_or_export()
{
return is_mariabackup_restore()
|| srv_operation == SRV_OPERATION_RESTORE_EXPORT;
}

inline bool is_mariabackup_restore()
{
/* To rollback XA's trx_sys must be initialized, the rest is the same
as regular backup restore, that is why we join this two operations in
the most cases. */
return srv_operation == SRV_OPERATION_RESTORE
|| srv_operation == SRV_OPERATION_RESTORE_ROLLBACK_XA;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mariabackup --prepare --export is supposed to start up more or less like a normal server. It must restore all undo logs, so that any incomplete transactions in the to-be-exported tables can be rolled back. The suggested changes to trx_rseg_mem_restore() and trx_lists_init_at_db_start() would seem to break that logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dr-m Thank you for the comment. This question maybe due to lack of my understanding; so please help me to understand --

  1. When are we exactly writing transaction data to the redo log ? is it upon transaction commit or can it happen before transaction commit also ? -- I do understand if the transaction is quite large and involves many index changes then we need to write to the redo log prior to commit to save memory. Also, does this has any connection to the transaction isolation level and autocommit statements ?
  2. If we are writing transaction data to the redo log prior to transaction commit, then is there a separate redo log event to identify the transaction commit (Something like MLOG_COMMIT; couldnt find such MLOG event in 10.2 codebase) ?

Assuming we write to the redo log prior to transaction commit, I think I agree with your comment and let me think about an alternative fix keeping what you said in my head.

Thanks
Thejaka

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry for missing your replies. You may be on the right track, but the code has changed a lot, mainly in the 10.6 branch. I would like to see also a 10.6 version of this. You’d better avoid any unnecessary white space changes to ease merging or cherry-picking.

  1. Transaction data is written to undo log pages (rollback segment header pages and undo log pages). Starting with MariaDB Server 10.3, the TRX_SYS page is a read-only directory pointing to the rollback segment header pages. Any persistent InnoDB data pages are covered by a write-ahead log (redo log, ib_logfile0).
  2. Undo log records are indeed written before any data modifications. On transaction commit, only the transaction state in the undo log header will be updated.

You may find this old presentation (slides and video) useful.

…o the test case to make sure we get consistent results all the time.
…nges

	made to a table from an uncommited transaction. During “—export”,
	we need to rollback uncommitted transactions from the table.
	In this commit, revised code to address that issue.

	The revised fix has the following:
	1. Revert the code changes made in the previous submission
	2. Added code to exclude tables that are not in the data directory
	   when processing undo log records during recovery and in RESTORE_EXPORT mode.
	   The basic idea is that if a transaction includes multiple tables and
	   if we export only a subset of those tables, then during recovery we apply
	   undo log changes only to the exported tables. We ignore undo log records
 	   relevant to non-existing tables.
Comment on lines +641 to +650
// find the table name for the given table id ...
mutex_enter(&dict_sys.mutex);
dict_find_table_name_from_id(table_id, &table_name);
mutex_exit(&dict_sys.mutex);

// if table name is not null and if the table is exported to the backup
// then add it to the table list ...
if (table_name != NULL) {
// given table name, find the ibd file path ...
char *ibd_file_path = fil_make_filepath(NULL, table_name, IBD, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if the data file was excluded from a backup, the table metadata will be available in SYS_TABLES and other dictionary tables.

This code seems to be duplicating some existing logic. I do not think that this is a good idea. It would be better to tweak dict_load_table_one() or dict_load_table_low() or related code so that it does the right thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants