-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-29050: Fixing 'MDEV-29050: [ERROR] InnoDB: Operating system error number 2 in a file operation.' #2183
base: 10.4
Are you sure you want to change the base?
MDEV-29050: Fixing 'MDEV-29050: [ERROR] InnoDB: Operating system error number 2 in a file operation.' #2183
Conversation
…for read-only: './db/t1.ibd' OS error: 71'. The Problem =========== When restoring a sparse backup (a backup executed with --tables or --tables-file) we execute 'mariabackup --prepare --export ...'. For certain restores we see a number of errors like the following: [ERROR] InnoDB: Cannot open datafile for read-only: './db/t1.ibd' OS error: 71 All the errors are relevant to tables that were not specified in the sparse backup. For example, if we have two tables t1 and t2 and if we do, 'mariabackup --backup --tables=t2 ....' and to restore if we execute 'mariabackup --prepare --export ...', then, we get an error relevant to t1. Also, note that the datafiles for unspecified (in the sparse backup) tables are not present in the backup. Expected Behaviour ================== 'mariabackup --prepare --export ...' should be executed without errors for sparse backups. The Cause ========= During 'mariabackup --prepare --export', the recovery code inside innodb goes through undo log records and attempts to resurrect transactions. However, for backups, we do not need to recover transactions. For backups it is sufficient to work with redo logs and data files. This logic is already implemented for SRV_OPERATION_RESTORE but not implemented for SRV_OPERATION_RESTORE_EXPORT. Hence, SRV_OPERATION_RESTORE_EXPORT attempts to recover pending transactions. The Fix ======= Apply same logic to SRV_OPERATION_RESTORE_EXPORT as for SRV_OPERATION_RESTORE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. This seems applicable to 10.3, our earliest maintained version also right?
|
||
ALTER TABLE t1 IMPORT TABLESPACE; | ||
echo "# Select all should include only 10"; | ||
SELECT * FROM t1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for test case repeatability, put ORDER BY a
on statements like this to ensure the output result order doesn't change.
Edit: obviously more applicable to statements that return more than 1 result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
storage/innobase/trx/trx0rseg.cc
Outdated
@@ -491,7 +491,8 @@ static dberr_t trx_rseg_mem_restore(trx_rseg_t *rseg, trx_id_t &max_trx_id, | |||
} | |||
} | |||
|
|||
if (srv_operation == SRV_OPERATION_RESTORE) { | |||
if ((srv_operation == SRV_OPERATION_RESTORE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the correction be added to is_mariabackup_restore_or_export()
(storage/innobase/include/srv0srv.h) and for this, and the other case, be changed to use this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% certain whether we should make the change to is_mariabackup_restore_or_export(). Because is_mariabackup_restore_or_export checks whether server operation is SRV_OPERATION_RESTORE or SRV_OPERATION_RESTORE_ROLLBACK_XA or SRV_OPERATION_RESTORE_EXPORT (See [1]). I am not 100% certain whether we should return when server action is SRV_OPERATION_RESTORE_ROLLBACK_XA also.
[1]
inline bool is_mariabackup_restore_or_export()
{
return is_mariabackup_restore()
|| srv_operation == SRV_OPERATION_RESTORE_EXPORT;
}
inline bool is_mariabackup_restore()
{
/* To rollback XA's trx_sys must be initialized, the rest is the same
as regular backup restore, that is why we join this two operations in
the most cases. */
return srv_operation == SRV_OPERATION_RESTORE
|| srv_operation == SRV_OPERATION_RESTORE_ROLLBACK_XA;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mariabackup --prepare --export
is supposed to start up more or less like a normal server. It must restore all undo logs, so that any incomplete transactions in the to-be-exported tables can be rolled back. The suggested changes to trx_rseg_mem_restore()
and trx_lists_init_at_db_start()
would seem to break that logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dr-m Thank you for the comment. This question maybe due to lack of my understanding; so please help me to understand --
- When are we exactly writing transaction data to the redo log ? is it upon transaction commit or can it happen before transaction commit also ? -- I do understand if the transaction is quite large and involves many index changes then we need to write to the redo log prior to commit to save memory. Also, does this has any connection to the transaction isolation level and autocommit statements ?
- If we are writing transaction data to the redo log prior to transaction commit, then is there a separate redo log event to identify the transaction commit (Something like MLOG_COMMIT; couldnt find such MLOG event in 10.2 codebase) ?
Assuming we write to the redo log prior to transaction commit, I think I agree with your comment and let me think about an alternative fix keeping what you said in my head.
Thanks
Thejaka
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry for missing your replies. You may be on the right track, but the code has changed a lot, mainly in the 10.6 branch. I would like to see also a 10.6 version of this. You’d better avoid any unnecessary white space changes to ease merging or cherry-picking.
- Transaction data is written to undo log pages (rollback segment header pages and undo log pages). Starting with MariaDB Server 10.3, the
TRX_SYS
page is a read-only directory pointing to the rollback segment header pages. Any persistent InnoDB data pages are covered by a write-ahead log (redo log,ib_logfile0
). - Undo log records are indeed written before any data modifications. On transaction commit, only the transaction state in the undo log header will be updated.
You may find this old presentation (slides and video) useful.
…o the test case to make sure we get consistent results all the time.
…nges made to a table from an uncommited transaction. During “—export”, we need to rollback uncommitted transactions from the table. In this commit, revised code to address that issue. The revised fix has the following: 1. Revert the code changes made in the previous submission 2. Added code to exclude tables that are not in the data directory when processing undo log records during recovery and in RESTORE_EXPORT mode. The basic idea is that if a transaction includes multiple tables and if we export only a subset of those tables, then during recovery we apply undo log changes only to the exported tables. We ignore undo log records relevant to non-existing tables.
// find the table name for the given table id ... | ||
mutex_enter(&dict_sys.mutex); | ||
dict_find_table_name_from_id(table_id, &table_name); | ||
mutex_exit(&dict_sys.mutex); | ||
|
||
// if table name is not null and if the table is exported to the backup | ||
// then add it to the table list ... | ||
if (table_name != NULL) { | ||
// given table name, find the ibd file path ... | ||
char *ibd_file_path = fil_make_filepath(NULL, table_name, IBD, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if the data file was excluded from a backup, the table metadata will be available in SYS_TABLES
and other dictionary tables.
This code seems to be duplicating some existing logic. I do not think that this is a good idea. It would be better to tweak dict_load_table_one()
or dict_load_table_low()
or related code so that it does the right thing.
[x] *The Jira issue number for this PR is: MDEV-29050: [ERROR] InnoDB: Operating system error number 2 in a file operation.
Description
The Problem
When restoring a sparse backup (a backup executed with --tables or --tables-file) we execute
'mariabackup --prepare --export ...'. For certain restores we see a number of errors like the
following:
[ERROR] InnoDB: Cannot open datafile for read-only: './db/t1.ibd' OS error: 71
All the errors are relevant to tables that were not specified in the sparse backup.
For example, if we have two tables t1 and t2 and if we do,
'mariabackup --backup --tables=t2 ....' and to restore if we execute
'mariabackup --prepare --export ...', then, we get an error relevant to t1.
Also, note that the datafiles for unspecified (in the sparse backup) tables are not present in the backup.
Expected Behaviour
'mariabackup --prepare --export ...' should be executed without errors for sparse backups.
The Cause
During 'mariabackup --prepare --export', the recovery code inside
innodb goes through undo log records and attempts to resurrect
transactions. However, for backups, we do not need to recover transactions. For backups
it is sufficient to work with redo logs and data files. This logic is already implemented
for SRV_OPERATION_RESTORE but not implemented for SRV_OPERATION_RESTORE_EXPORT. Hence, SRV_OPERATION_RESTORE_EXPORT
attempts to recover pending transactions.
The Fix
Apply same logic to SRV_OPERATION_RESTORE_EXPORT as for SRV_OPERATION_RESTORE.
How can this PR be tested?
A mtr test case is added.
Basing the PR against the correct MariaDB version
You might consider answering some questions like:
who upgrades from a version prior to this patch? -- No.
created prior to your fix? -- Yes.