Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2ARC metadata refresh by zpool scrub #16416

Open
zfsuser opened this issue Aug 4, 2024 · 5 comments
Open

L2ARC metadata refresh by zpool scrub #16416

zfsuser opened this issue Aug 4, 2024 · 5 comments
Labels
Type: Feature Feature request or new feature

Comments

@zfsuser
Copy link

zfsuser commented Aug 4, 2024

Describe the feature would like to see added to OpenZFS

Option to allow zpool scrub to "refresh" L2ARC stored pool metadata.

How will this feature improve OpenZFS?

Metadata (and data) stored in L2ARC (but not ARC) will typically be lost from the L2ARC due to being overwritten, depending on relative L2ARC size and churn-rate. This would create an issue when trying to keep the complete pool metadata in L2ARC, especially if also (a limited amount of MFU) data shall be stored (e.g. using #16343).

If a zpool scrub would forward the read pool metadata to the ARC, it should allow the l2arc_feed_thread to store most of the missing pool metadata in the L2ARC. This would also allow to trigger caching the (complete) pool metadata in the L2ARC.

It would be a workaround, but a cyclic scrub should be able to keep most of the pool metadata in the L2ARC.

Additional context

Component: SCRUB
Component: ARC/L2ARC

A tunable would be needed to (de)activate the new feature. Default = old behaviour.

Further information (added 2024-08-07):

This feature is intended for pools:

  • Where the hosting server/NAS/SAN hardware does not provide the necessary amount of interfaces/slots to add enough special devices to ensure metadata redundancy.
  • Where the complete pool metadata and an useful amount of the ARC MFU data shall be stored in L2ARC, and where the pool data and configuration and L2ARC capacity allows it.
    Remark: At least for zvols it should be possible for the user/admin to estimate the required L2ARC capacity, a zvol with 64kiB volblocksize (typically) requires < 0.2% (used) pool capacity for metadata.
  • Where the L2ARC resides in a low read-latency device (NVME), while the pool data resides in standard read-latency (SATA) drives.

The idea of this feature is that the pool metadata read during scrub is not only used for scrub, but also made available to ARC for storage in the L2ARC. A scheduled/cyclic scrub would therefore lead to a cyclic on-the-fly regeneration of the pool metadata in the L2ARC.

This feature would also enable the user to copy the pool metadata in the L2ARC, without having to e.g. delete the pool and send/receive it from a backup pool.

@zfsuser zfsuser added the Type: Feature Feature request or new feature label Aug 4, 2024
@amotin
Copy link
Member

amotin commented Aug 4, 2024

For the first, this has nothing to do with scrub. Read back would have to happen similar to writing of l2arc_feed_thread, matching its speed. But since the logic of L2ARC is to write blocks that are soon to be evicted from ARC, attempt to re-read the old block will require even more ARC evictions, which will require even more L2ARC writes, that is a dead end. The critical point here we would need to decide is what blocks we consider important enough to read back and rewrite, even if they are not accessed for a while at this point, otherwise they could be rewritten into L2ARC in a normal way some time after. That information is not stored in persistent L2ARC metadata, so will be lost on reboot, but in ARC we have a counter of how many times each block in L2ARC was read since it was written there. But how would we compare stats from a blocks that are already in L2ARC for a while with blocks that only pretend to be written there first time? We still want L2ARC to store the most often used blocks even if the workload is changing. For example, users may always want L2ARC to cover blocks stored over the last week, but almost never access the older ones. Would we just keep the old blocks, we would penalize the new ones. Effectively we would need a MRU/MFU logic used for ARC, except that with L2ARC capacities we can not afford memory required for ghost states, since it would double ARC usage by L2ARC headers, which is already a problem.

@rincebrain
Copy link
Contributor

Well, we do have the SSD sitting there...is there any reason we couldn't most-frequently-hit/most-recently-hit in a tiny section at the end, for data? (Metadata we probably just want to cache, period, if the drive is large enough...)

@amotin
Copy link
Member

amotin commented Aug 6, 2024

We can not keep all the old metadata, if not for other reasons, then just because current persistent L2ARC implementation during replay often resurrects blocks there were deleted earlier until the L2ARC is completely rotated, which will mean that L2ARC will gradually be filled by obsolete metadata blocks.

@rincebrain
Copy link
Contributor

I mean, sure, we can't keep all of it, I just meant philosophically we didn't necessarily want to keep track of MRU/MFU for eviction of metadata in the same way because it doesn't have the same "if we didn't use it recently/often we don't care" properties.

@zfsuser
Copy link
Author

zfsuser commented Aug 7, 2024

The original feature description was missing context and was therefore misleading. I have added additional information to (hopefully) clarify the intention of this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

3 participants