Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LVM cache cannot be removed if cache volume is lost #35

Open
Brain2000 opened this issue Jun 21, 2020 · 6 comments
Open

LVM cache cannot be removed if cache volume is lost #35

Brain2000 opened this issue Jun 21, 2020 · 6 comments

Comments

@Brain2000
Copy link

Brain2000 commented Jun 21, 2020

If an LVM logical volume is backed by a cached volume, and that cached volume disappears or becomes corrupt, it cannot be removed.

Example, assume a logical volume called lvmgroup/disk:

[root]# modprobe brd rd_nr=2 rd_size=4194304 max_part=1
[root]# pvcreate /dev/ram0
[root]# vgextend lvmgroup /dev/ram0
[root]# lvcreate -l 100%FREE -n cache lvmgroup /dev/ram0
[root]# lvconvert --type cache --cachepool cache --cachemode writeback --cachesettings 'autocommit_time=1000' lvmgroup/disk

After the lvmgroup/disk has been set to cache mode, reboot, or force the ramdisk offline.
Such as:

[root]# lvchange -a n lvmgroup
[root]# rmmod brd
[root]# lvchange -a y lvmgroup

Now try and remove the cache:

[root]# lvconvert --uncache lvmgroup/disk --force
  WARNING: Device for PV rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3 not found or rejected by a filter.
  Couldn't find device with uuid rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3.
  WARNING: Cache pool data logical volume lvmgroup/cache_cdata is missing.
  WARNING: Uncaching of partially missing writethrough cache volume lvmgroup/disk might destroy your data.
Do you really want to uncache lvmgroup/disk with missing LVs? [y/n]: y
  device-mapper: reload ioctl on  (250:4) failed: Invalid argument
  **Failed to active cache locally lvmgroup/disk.**

It's possible to force re-add the ram disk using the same uuid and then trying again:

[root]# pvcreate --norestore --uuid=rd6lF7-xH6h-Vtes-eMGU-bnTh-60z5-A2ruC3 /dev/ram0 -ff
Really INITIALIZE physical volume "/dev/ram0" of volume group "lvmgroup" [y/n]? y
  WARNING: Forcing physical volume creation on /dev/ram0 of volume group "lvmgroup"
  Physical volume "/dev/ram0" successfully created.
[root]# lvconvert --uncache lvmgroup/disk --force
  device-mapper: reload ioctl on  (250:3) failed: Invalid argument
  Failed to active cache locally lvmgroup/disk.

At this point, the only way I found this can be fixed is to take the /etc/lvm/backup/lvmgroup file, modify it to remove the cache entries, rename the disk_corig back to disk, add "VISIBLE" flag back, and then run vgcfgrestore -f on the modified file.

@Brain2000 Brain2000 changed the title LVM cache cannot be removed if volume is lost LVM cache cannot be removed if cache volume is lost Jun 21, 2020
@Brain2000
Copy link
Author

Brain2000 commented Jun 21, 2020

dmesg helped out here, the --cachesettings do not seem to work properly:

[10079.110609] device-mapper: cache: bad config value for autocommit_time: 1000

@grantwest
Copy link

I ran into a similar issue. In my case the cache drive was a usb ssd unplugged while the system was running, and now when I try to uncache I get this:

lvconvert --uncache extvg/extlv
WARNING: Couldn't find device with uuid Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY.
WARNING: VG extvg is missing PV Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY (last written to [unknown]).
WARNING: Couldn't find device with uuid Ehojxo-vSV5-bpPB-n86t-Ec8k-Yk7T-xKzMTY.
Command on LV extvg/extlv does not accept LV type linear.
Command not permitted on LV extvg/extlv.

I am on Ubuntu 22.04 running lvm:

LVM version:     2.03.11(2) (2021-01-08)
Library version: 1.02.175 (2021-01-08)
Driver version:  4.45.0

@hellkaim
Copy link

Bumping up.
My cache SSD for RAID5 array throws write/read errors and I need to replace it.
Stuck with "flashing 21 blocks ..." for 24h now.

Good thing is that it was a writethrough cache so I hope my data is ok.

Any way to remove cache from a LV in that situation?

@zkabelac
Copy link

Is the SSD in troubles (aka fails with 'read-error' ?) . It's overall interesting you've manage to get dirty cache in writethrough mode. The recovery in this case might be non-trivial as kernel target is a bit 'dumb' and cannot skip problematic parts of device.

So to get out of this case -

You can activate cache origin and cache data and metadata LVs in a 'component activation' mode - this brings you all devices separately in read-only mode (just active every 'subLv' of your cached LV individually with 'lvchange -ay ...'

Then you run 'dmsetup table' and grab the table line for your original cached device.
Then 'dmsetup reload vgname-lvname --table "xxxxx"' your device
Then 'dmsetup resume vgname-lvname' - to make this device writable.

Once this is done you can use 'cache_writeback' tool from device mapper persistent data tools package.

Once you manage to rescue 'maximum' blocks you can you deactivate everything and then you could forcibly remove/detach your caching device from your cached LV with 'lvremove --force vgname/cachepoolname'
This should give you some prompts but should let you uncache your LV (unless you would be using some very old version of lvm without supporting this).

It's a bit awkward solution for this case that should be enhanced on kernel side as well as on user-space side.

In case you find any troubles with the advices in this message - it's always better to ask before doing some irreversible damage.

@hellkaim
Copy link

hellkaim commented Oct 26, 2023

Let's check the sequence:
# dmsetup table vg1-lv1: 0 11719622656 cache 253:1 253:0 253:2 1024 2 metadata2 writethrough smq 2 migration_threshold 8192
vg1-lv1_cache_cpool_cdata: 0 838451200 linear 8:161 65535
vg1-lv1_cache_cpool_cmeta: 0 98304 linear 8:161 838746111
vg1-lv1_corig: 0 11719622656 linear 9:0 2048

So to do that:
lvchange -an vg1/lv1
lvchange -an vg1

and then:
lvchange -ay vg1/lv1_corig
lvchange -ay vg1/lv1_cache_cpool_cdata
lvchange -ay vg1/lv1_cache_cpool_cmeta

I am not sure I understood that correctly:
dmsetup reload vg1-lv1 --table "0 11719622656 linear 9:0 2048" <<- this is from the vg1-lv1_corig - we need to take the table of the orig volume or the underlyng main data volume?

If this is ok, the I do:
dmsetup resume vg1-lv1
cache_writeback /dev/mapper/vg1-lv1
lvchange -an vg1/lv1_corig
lvchange -an vg1/lv1_cache_cpool_cdata
lvchange -an vg1/lv1_cache_cpool_cmeta
lvremove --force vg1/lv1_cache_cpool

Correct?

@hellkaim
Copy link

Ok, let's say I was lucky:
# lvchange -an vg1
# sync
# lvconvert --uncache vg1/lv1
Do you really want to remove and DISCARD logical volume vg1/lv1_cache_cpool? [y/n]: y
Logical volume "lv1_cache_cpool" successfully removed.
Logical volume vg1/lv1 is not cached.
Thanks to pushing me towards lvchange -an vg1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants