Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDEV-9905 Redolog for NVDIMM in MariaDB #1689

Open
wants to merge 1 commit into
base: 10.5
Choose a base branch
from

Conversation

msunshinelxl
Copy link

1)Add feature: use PMDK interface instead of POSIX interface to
read and write redolog

2)Add building and running parameters, PMDK will be use when
build with '-DBUILD_WITH_PMDK=1' and run with 'innodb_use_pmdk=on'

1)Add feature: use PMDK interface instead of POSIX interface to
read and write redolog

2)Add building and running parameters, PMDK will be use when
build with '-DBUILD_WITH_PMDK=1' and run with 'innodb_use_pmdk=on'
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


xinlongli seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@msunshinelxl
Copy link
Author

The patch is submitted according to MCA terms.

@an3l an3l added the license-mca Contributed under the MCA label Oct 28, 2020
Copy link
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We got something similar in daaa881 and 3daef52 already. But, as far as I understand, we do not currently build packages that would include the libpmem dependency.

My intention is to implement deeper integration of InnoDB redo log writing with persistent memory, ideally bypassing the library and manually implementing the interface. Basically, when available, I would like to mmap(MAP_SYNC) the "ib_logfile0" to log_sys.buf and directly write the data there (performing clflush of 4 cache lines at a time to write it in 256-byte blocks, which I hear is the native block size).

I think that I would have to resume the MDEV-14425 implementation before I can move forward with that.

@dr-m
Copy link
Contributor

dr-m commented Mar 18, 2021

libpmem and HAVE_PMEM will be enabled by default on MariaDB Server 10.6 on the platforms that support it, since 7c5c6fa and 418381b. It is unclear how this USE_PMDK would improve the implementation that we already have in 10.5.

@an3l an3l added this to the 10.5 milestone Mar 25, 2021
@grooverdan grooverdan added the need feedback Can the contributor please address the questions asked. label Mar 26, 2021
Copy link

@guoanwu guoanwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the read, when read the PMEM data to the memory buffer, just use the memcpy is OK, don't use the pmem_memcpy_nodrain, since this will flush the cacheline data, it is not necessary here.

@dr-m
Copy link
Contributor

dr-m commented Jul 30, 2021

About the read, when read the PMEM data to the memory buffer, just use the memcpy is OK, don't use the pmem_memcpy_nodrain, since this will flush the cacheline data, it is not necessary here.

Thank you for the comment. That is in line with my understanding. When we read the log during recovery, there should be no concurrent writes. When mariabackup --backup is reading the log file, it might use regular file system calls. I do not think it currently invokes any PMEM interface. The regression test suite completed fine when I ran it on our PMEM device several months ago.

I intend to change the InnoDB redo log block format in MariaDB Server 10.7, and I have been thinking of making log_sys.buf point directly to the MAP_SYNC file when the PMEM interface is in use. For conventional block device storage, log_sys.buf would point to volatile RAM and we would issue system calls to write it to the file. Ideally, we would use memcpy() or similar functions to populate the data in log_sys.buf. Only the ‘make durable’ code path would differ: it would either invoke system calls, or something like the clflushor clflushopt or clwb instructions, depending on what the processor supports.

@guoanwu Our current implementation is only invoking pmem_memcpy_persist(), but I would like to move away from that (use normal memcpy() and the like, followed by an explicit durability instruction). I see that pmem_deep_persist() and related functions are marked experimental. What would you advise?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
license-mca Contributed under the MCA need feedback Can the contributor please address the questions asked.
7 participants