LVM cache cannot be removed if cache volume is lost #35

Brain2000 · 2020-06-21T22:06:21Z

If an LVM logical volume is backed by a cached volume, and that cached volume disappears or becomes corrupt, it cannot be removed.

Example, assume a logical volume called lvmgroup/disk:

[root]# modprobe brd rd_nr=2 rd_size=4194304 max_part=1
[root]# pvcreate /dev/ram0
[root]# vgextend lvmgroup /dev/ram0
[root]# lvcreate -l 100%FREE -n cache lvmgroup /dev/ram0
[root]# lvconvert --type cache --cachepool cache --cachemode writeback --cachesettings 'autocommit_time=1000' lvmgroup/disk

After the lvmgroup/disk has been set to cache mode, reboot, or force the ramdisk offline.
Such as:

[root]# lvchange -a n lvmgroup
[root]# rmmod brd
[root]# lvchange -a y lvmgroup

Now try and remove the cache:

[root]# lvconvert --uncache lvmgroup/disk --force
  WARNING: Device for PV rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3 not found or rejected by a filter.
  Couldn't find device with uuid rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3.
  WARNING: Cache pool data logical volume lvmgroup/cache_cdata is missing.
  WARNING: Uncaching of partially missing writethrough cache volume lvmgroup/disk might destroy your data.
Do you really want to uncache lvmgroup/disk with missing LVs? [y/n]: y
  device-mapper: reload ioctl on  (250:4) failed: Invalid argument
  **Failed to active cache locally lvmgroup/disk.**

It's possible to force re-add the ram disk using the same uuid and then trying again:

[root]# pvcreate --norestore --uuid=rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3 /dev/ram0 -ff
Really INITIALIZE physical volume "/dev/ram0" of volume group "lvmgroup" [y/n]? y
  WARNING: Forcing physical volume creation on /dev/ram0 of volume group "lvmgroup"
  Physical volume "/dev/ram0" successfully created.
[root]# lvconvert --uncache lvmgroup/disk --force
  device-mapper: reload ioctl on  (250:3) failed: Invalid argument
  Failed to active cache locally lvmgroup/disk.

At this point, the only way I found this can be fixed is to take the /etc/lvm/backup/lvmgroup file, modify it to remove the cache entries, rename the disk_corig back to disk, add "VISIBLE" flag back, and then run vgcfgrestore -f on the modified file.

The text was updated successfully, but these errors were encountered:

Brain2000 · 2020-06-21T23:13:56Z

dmesg helped out here, the --cachesettings do not seem to work properly:

[10079.110609] device-mapper: cache: bad config value for autocommit_time: 1000

grantwest · 2023-08-12T17:22:28Z

I ran into a similar issue. In my case the cache drive was a usb ssd unplugged while the system was running, and now when I try to uncache I get this:

lvconvert --uncache extvg/extlv
WARNING: Couldn't find device with uuid Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY.
WARNING: VG extvg is missing PV Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY (last written to [unknown]).
WARNING: Couldn't find device with uuid Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY.
Command on LV extvg/extlv does not accept LV type linear.
Command not permitted on LV extvg/extlv.

I am on Ubuntu 22.04 running lvm:

LVM version:     2.03.11(2) (2021-01-08)
Library version: 1.02.175 (2021-01-08)
Driver version:  4.45.0

hellkaim · 2023-10-26T04:52:42Z

Bumping up.
My cache SSD for RAID5 array throws write/read errors and I need to replace it.
Stuck with "flashing 21 blocks ..." for 24h now.

Good thing is that it was a writethrough cache so I hope my data is ok.

Any way to remove cache from a LV in that situation?

zkabelac · 2023-10-26T11:07:54Z

Is the SSD in troubles (aka fails with 'read-error' ?) . It's overall interesting you've manage to get dirty cache in writethrough mode. The recovery in this case might be non-trivial as kernel target is a bit 'dumb' and cannot skip problematic parts of device.

So to get out of this case -

You can activate cache origin and cache data and metadata LVs in a 'component activation' mode - this brings you all devices separately in read-only mode (just active every 'subLv' of your cached LV individually with 'lvchange -ay ...'

Then you run 'dmsetup table' and grab the table line for your original cached device.
Then 'dmsetup reload vgname-lvname --table "xxxxx"' your device
Then 'dmsetup resume vgname-lvname' - to make this device writable.

Once this is done you can use 'cache_writeback' tool from device mapper persistent data tools package.

Once you manage to rescue 'maximum' blocks you can you deactivate everything and then you could forcibly remove/detach your caching device from your cached LV with 'lvremove --force vgname/cachepoolname'
This should give you some prompts but should let you uncache your LV (unless you would be using some very old version of lvm without supporting this).

It's a bit awkward solution for this case that should be enhanced on kernel side as well as on user-space side.

In case you find any troubles with the advices in this message - it's always better to ask before doing some irreversible damage.

hellkaim · 2023-10-26T12:25:51Z

Let's check the sequence:
# dmsetup table vg1-lv1: 0 11719622656 cache 253:1 253:0 253:2 1024 2 metadata2 writethrough smq 2 migration_threshold 8192
vg1-lv1_cache_cpool_cdata: 0 838451200 linear 8:161 65535
vg1-lv1_cache_cpool_cmeta: 0 98304 linear 8:161 838746111
vg1-lv1_corig: 0 11719622656 linear 9:0 2048

So to do that:
lvchange -an vg1/lv1
lvchange -an vg1

and then:
lvchange -ay vg1/lv1_corig
lvchange -ay vg1/lv1_cache_cpool_cdata
lvchange -ay vg1/lv1_cache_cpool_cmeta

I am not sure I understood that correctly:
dmsetup reload vg1-lv1 --table "0 11719622656 linear 9:0 2048" <<- this is from the vg1-lv1_corig - we need to take the table of the orig volume or the underlyng main data volume?

If this is ok, the I do:
dmsetup resume vg1-lv1
cache_writeback /dev/mapper/vg1-lv1
lvchange -an vg1/lv1_corig
lvchange -an vg1/lv1_cache_cpool_cdata
lvchange -an vg1/lv1_cache_cpool_cmeta
lvremove --force vg1/lv1_cache_cpool

Correct?

hellkaim · 2023-10-26T12:39:40Z

Ok, let's say I was lucky:
# lvchange -an vg1
# sync
# lvconvert --uncache vg1/lv1
Do you really want to remove and DISCARD logical volume vg1/lv1_cache_cpool? [y/n]: y
Logical volume "lv1_cache_cpool" successfully removed.
Logical volume vg1/lv1 is not cached.
Thanks to pushing me towards lvchange -an vg1

Brain2000 changed the title ~~LVM cache cannot be removed if volume is lost~~ LVM cache cannot be removed if cache volume is lost Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LVM cache cannot be removed if cache volume is lost #35

LVM cache cannot be removed if cache volume is lost #35

Brain2000 commented Jun 21, 2020 •

edited

Brain2000 commented Jun 21, 2020 •

edited

grantwest commented Aug 12, 2023

hellkaim commented Oct 26, 2023

zkabelac commented Oct 26, 2023

hellkaim commented Oct 26, 2023 •

edited

hellkaim commented Oct 26, 2023

LVM cache cannot be removed if cache volume is lost #35

LVM cache cannot be removed if cache volume is lost #35

Comments

Brain2000 commented Jun 21, 2020 • edited

Brain2000 commented Jun 21, 2020 • edited

grantwest commented Aug 12, 2023

hellkaim commented Oct 26, 2023

zkabelac commented Oct 26, 2023

hellkaim commented Oct 26, 2023 • edited

hellkaim commented Oct 26, 2023

Brain2000 commented Jun 21, 2020 •

edited

Brain2000 commented Jun 21, 2020 •

edited

hellkaim commented Oct 26, 2023 •

edited