Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pvscan: LVM2-2.02.186 didn't activate some LVs after bootup #24

Closed
X1aomu opened this issue Aug 31, 2019 · 22 comments
Closed

pvscan: LVM2-2.02.186 didn't activate some LVs after bootup #24

X1aomu opened this issue Aug 31, 2019 · 22 comments

Comments

@X1aomu
Copy link

X1aomu commented Aug 31, 2019

I experienced inactive LVs after bootup when using lvm2-2.02.186.

log for lvm2-2.02.185
Aug 31 07:10:01 archlinux systemd[1]: Starting LVM2 PV scan on device 259:5...
Aug 31 07:10:01 archlinux lvm[255]:   Couldn't find device with uuid kerXXT-Qve2-yj88-8Qwj-MpYl-0cdc-VRyzd6.
Aug 31 07:10:01 archlinux lvm[255]:   WARNING: Device for PV kerXXT-Qve2-yj88-8Qwj-MpYl-0cdc-VRyzd6 not found or rejected by a filter.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/debian-root. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/linux-home. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/archlinux-var. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/debian-root. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/linux-home. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/archlinux-var. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/debian-root. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/linux-home. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/archlinux-var. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/debian-root. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/linux-home. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/archlinux-var. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/debian-root. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/linux-home. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing refresh of partial LV vg0/archlinux-var. Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   vg0: refresh before autoactivation failed.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing activation of partial LV vg0/debian-root.  Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing activation of partial LV vg0/linux-home.  Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   Refusing activation of partial LV vg0/archlinux-var.  Use '--activationmode partial' to override.
Aug 31 07:10:01 archlinux lvm[255]:   1 logical volume(s) in volume group "vg0" now active
Aug 31 07:10:01 archlinux lvm[255]:   vg0: autoactivation failed.
Aug 31 07:10:01 archlinux systemd[1]: lvm2-pvscan@259:5.service: Main process exited, code=exited, status=5/NOTINSTALLED
Aug 31 07:10:01 archlinux systemd[1]: lvm2-pvscan@259:5.service: Failed with result 'exit-code'.
Aug 31 07:10:01 archlinux systemd[1]: Failed to start LVM2 PV scan on device 259:5.
Aug 31 07:10:01 archlinux systemd[1]: lvm2-pvscan@259:5.service: Consumed 16ms CPU time.
Aug 31 07:10:03 hpArch systemd[1]: Starting LVM2 PV scan on device 259:5...
Aug 31 07:10:03 hpArch lvm[386]:   WARNING: lvmetad is being updated, retrying (setup) for 10 more seconds.
Aug 31 07:10:06 hpArch lvm[386]:   4 logical volume(s) in volume group "vg0" now active
Aug 31 07:10:06 hpArch systemd[1]: Started LVM2 PV scan on device 259:5.
Aug 31 07:12:42 hpArch systemd[1]: Stopping LVM2 PV scan on device 259:5...
Aug 31 07:12:42 hpArch systemd[1]: lvm2-pvscan@259:5.service: Succeeded.
Aug 31 07:12:42 hpArch systemd[1]: Stopped LVM2 PV scan on device 259:5.
Aug 31 07:12:42 hpArch systemd[1]: lvm2-pvscan@259:5.service: Consumed 19ms CPU time.
-- Reboot --
log for lvm2-2.02.186
Aug 31 07:13:04 archlinux systemd[1]: Starting LVM2 PV scan on device 259:5...
Aug 31 07:13:04 archlinux lvm[233]:   Couldn't find device with uuid kerXXT-Qve2-yj88-8Qwj-MpYl-0cdc-VRyzd6.
Aug 31 07:13:04 archlinux lvm[233]:   pvscan[233] activating all directly (lvmetad token) 259:5
Aug 31 07:13:04 archlinux lvm[233]:   WARNING: Device for PV kerXXT-Qve2-yj88-8Qwj-MpYl-0cdc-VRyzd6 not found or rejected by a filter.
Aug 31 07:13:04 archlinux lvm[233]:   pvscan[233] VG vg0 run autoactivation.
Aug 31 07:13:04 archlinux lvm[233]:   Refusing activation of partial LV vg0/debian-root.  Use '--activationmode partial' to override.
Aug 31 07:13:04 archlinux lvm[233]:   Refusing activation of partial LV vg0/linux-home.  Use '--activationmode partial' to override.
Aug 31 07:13:04 archlinux lvm[233]:   Refusing activation of partial LV vg0/archlinux-var.  Use '--activationmode partial' to override.
Aug 31 07:13:04 archlinux lvm[233]:   1 logical volume(s) in volume group "vg0" now active
Aug 31 07:13:04 archlinux lvm[233]:   vg0: autoactivation failed.
Aug 31 07:13:04 archlinux systemd[1]: lvm2-pvscan@259:5.service: Main process exited, code=exited, status=5/NOTINSTALLED
Aug 31 07:13:04 archlinux systemd[1]: lvm2-pvscan@259:5.service: Failed with result 'exit-code'.
Aug 31 07:13:04 archlinux systemd[1]: Failed to start LVM2 PV scan on device 259:5.
Aug 31 07:13:04 archlinux systemd[1]: lvm2-pvscan@259:5.service: Consumed 13ms CPU time.
Aug 31 07:13:05 hpArch systemd[1]: Starting LVM2 PV scan on device 259:5...
Aug 31 07:13:07 hpArch lvm[362]:   pvscan[362] VG vg0 skip autoactivation.
Aug 31 07:13:07 hpArch systemd[1]: Started LVM2 PV scan on device 259:5.

It seems to be caused by 8bcd482.

I guess that avoiding redundant activation accidently leads to skipping autoactivation after initramfs was load, which cause some of LVs be inactive. Though I currently don't know why not all of my LVs can be activated at once.

Some extra infomation was given in https://bbs.archlinux.org/viewtopic.php?id=248788

@m0ellemeister
Copy link

I experienced the same behaviour. My System runs on Arch Linux too.

On my system the Volumes for /home and /var can't be mounted. When my System drops to emergency shell, I'm able to activate and mount the corresponding LV's by hand. After mounting the missing volumes by hand the system will continue to boot after issueing the command:
systemctl default

@hifigraz
Copy link

Same problem here

@GargleBlaster259
Copy link

Same here; the volume which /home is residing on doesn't get activated on boot and the system drops into emergency mode. Reverting the commit mentioned by X1aomu does seem to fix it.

@stiefel40k
Copy link

I had the same issue, and reverting helped. Interestingly on the laptop of my friend, with similar setup (LUKS + LVM) and up to date Arch the issue was not present.

@sisyphus74
Copy link

This Bug does not occur on systems with LVM on LUKS with one partition (tested)
I had the same problem:
Setup: LVM on LUKS (Arch Linux), LVM on LUKS over two partitions or two SSDs
After Upgrade to
core/device-mapper 2.02.186-1 (base) - device-mapper-2.02.186-1-x86_64.pkg.tar.xz
core/lvm2 2.02.186-1 (base) - lvm2-2.02.186-1-x86_64.pkg.tar.xz
system runs into recovery shell after password input.
workaround: Downgrade to
device-mapper-2.02.185-1-x86_64.pkg.tar.xz
lvm2-2.02.185-1-x86_64.pkg.tar.xz
helps and boot is successful.

@X1aomu
Copy link
Author

X1aomu commented Sep 3, 2019

My setup is LVM with two disks and without LUKS so that LUKS may be irrelevant to this issue. It could be multiple PV/LV on different paritions/disks.

@m0ellemeister
Copy link

My setup is LVM with two disks and without LUKS so that LUKS may be irrelevant to this issue. It could be multiple PV/LV on different paritions/disks.

I agree with that. I don't use LUKS, but an LVM with one VG including several LV's. One of the LV's is a Cached LV. The Cached Volume resides on a software RAID 5.

@teigland
Copy link
Contributor

teigland commented Sep 3, 2019

There is a problem with the commit mentioned above ("pvscan: avoid redundant activation") that I'm working on fixing. We can't have pvscan creating the new temp files for incomplete VGs. This is a problem because pvscan initialization is mistakenly attempting to activate incomplete VGs when doing initialization, but that should be a separate fix (it's an older problem.)

@teigland
Copy link
Contributor

teigland commented Sep 3, 2019

Activating incomplete VGs was the main problem to fix. There is a patch for that in RH bug 1748430 that can be tested.

@teigland
Copy link
Contributor

teigland commented Sep 4, 2019

I think this fix should help:
https://sourceware.org/git/?p=lvm2.git;a=commit;h=6b12930860a993624d6325aec2e9c561f4412aa9

Could someone test this (using the stable-2.02 lvm branch) to confirm?

@X1aomu
Copy link
Author

X1aomu commented Sep 5, 2019

@teigland Thanks for your hard work. The fix has been tested on my machine and it works fine. Hope it can be tested on more machines affected and the new release come out!

@dedean16
Copy link

dedean16 commented Sep 8, 2019

Same issue here (I'm on Manjaro). A VG with multiple LVs on multiple disks. One LV is mirrored and doesn't get activated. The others are activated. Now using use_lvmetad = 0 in the config (/etc/lvm/lvm.conf) as a temporary fix.

@PotcFdk
Copy link

PotcFdk commented Sep 8, 2019

@teigland That fix seems to work for me as well.
It was actually cherry-picked into the lvm2 package in the Arch Linux repos and is now published.

I had the bad luck of trying out lvmcache for the first time ever, which required adding another crypto device as a PV into my existing VG, which ended up triggering this bug. I initially resolved it by downgrading lvm after finding out about this issue right here, but now I've updated to the fixed version and can boot my system without issues.

@sisyphus74
Copy link

@PotcFdk , thanks. For me (Arch, LV over two encrypted partitions) it works now!

@m0ellemeister
Copy link

@PotcFdk & @teigland I've the lvm2 package in Verion 2.02.186-2 now installed, too. I can confirm that this packages fixes the problem. My system boots up just normally, all Logical Volumes get activated at boot.

@anayrat
Copy link

anayrat commented Sep 9, 2019

Hello,
FYI, this fix made things worse on my Arch. My setup looks like this :

  • two encrypted PV with same password
  • on top of them, I created one VG
  • inside this VG I created two thin pool, one on each PV
  • then I created several LV in thin pool.
 lsblk  -a
NAME                   MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                      8:0    0   1.8T  0 disk  
└─sda1                   8:1    0   1.8T  0 part  
  └─cryptdisk2         254:8    0   1.8T  0 crypt 
    ├─vg-pool2_tmeta   254:9    0     1G  0 lvm   
    │ └─vg-pool2-tpool 254:11   0     1T  0 lvm   
    │   ├─vg-pool2     254:12   0     1T  0 lvm   
    │   └─vg-storage   254:13   0   500G  0 lvm   /storage
    └─vg-pool2_tdata   254:10   0     1T  0 lvm   
      └─vg-pool2-tpool 254:11   0     1T  0 lvm   
        ├─vg-pool2     254:12   0     1T  0 lvm   
        └─vg-storage   254:13   0   500G  0 lvm   /storage
sdb                      8:16   0 238.5G  0 disk  
├─sdb1                   8:17   0     2M  0 part  
├─sdb2                   8:18   0 237.9G  0 part  
│ └─cryptdisk1         254:0    0 237.9G  0 crypt 
│   ├─vg-pool1_tmeta   254:1    0     1G  0 lvm   
│   │ └─vg-pool1-tpool 254:3    0   120G  0 lvm   
│   │   ├─vg-pool1     254:4    0   120G  0 lvm   
│   │   ├─vg-root      254:5    0    50G  0 lvm   /
│   │   └─vg-home      254:6    0    60G  0 lvm   /home
│   ├─vg-pool1_tdata   254:2    0   120G  0 lvm   
│   │ └─vg-pool1-tpool 254:3    0   120G  0 lvm   
│   │   ├─vg-pool1     254:4    0   120G  0 lvm   
│   │   ├─vg-root      254:5    0    50G  0 lvm   /
│   │   └─vg-home      254:6    0    60G  0 lvm   /home
│   └─vg-swap          254:7    0     1G  0 lvm   [SWAP]
└─sdb3                   8:19   0   598M  0 part  /boot

During the boot the system is stuck during pvscan. Here is my understanding:

  • during the boot, only one pv is decrypted : cryptdisk1 (it is normal, it used to decrypt the second one when I access the volume thanks to sd-encrypt if I am right)
  • pvscan detects it should exist two PV but it can only find the first one
  • vg is activated
  • the system can't activate thinpool (I don't know why). It seems it is due to the fact only the first PV is decrypted.
  • as the thinpool is not activated, the system can't activate depending LV

I was able to manually activate thinpool, then lv and chroot my system to downgrade package 2.02.185

The main difference between my setup and other that reported the fix corrected the issue, is that I am using thinpool.

I hope this help to solve the issue.

@PotcFdk
Copy link

PotcFdk commented Sep 9, 2019

it used to decrypt the second one when I access the volume thanks to sd-encrypt if I am right

That's how it (sd-encrypt) is supposed to work, at least.

pvscan detects it should exist two PV but it can only find the first one

Yes, for the first PV at least, so it tries to activate an incomplete VG, which leads us to your second step:

vg is activated

I assume it can't be activated due to the missing PV.
Instead, at least that's how it works in my setup, it then proceeds to decrypt and detect the second PV and it the re-tries to activate the VG, this time successfully.[1] The rest continues as it should.

So when you run 2.02.186-2 and try to boot, wait for the lvm2-pvscan unit to time out and then enter the emergency shell (by pressing return IIRC) and run

  • journalctl -xb to figure out why it failed (I presume it's because it was waiting for something)
  • lvm lvdisplay / lvm pvdisplay to see which PVs have been found (and, more importantly, which ones, if any, are missing)
  • whatever commands are relevant to your setup to figure out why the PV that's missing is in fact missing. Is it because the block device that contains it had not been decrypted? Is it visible but did it fail to get detected as a PV for some reason?

I think this would help to understand what's going on. Unless I'm misunderstanding your setup...

Annotations:
[1] This is the part that this issue here is about: On the broken LVM version, when the second PV appears after being decrypted, LVM does not attempt to activate the VG again. Instead, it skips the VG because it thinks that this had already happened when the first PV appeared (where it, in reality, had failed).

@0x9fff00
Copy link

The patch isn't working for me either. Here is my setup:

  • /boot on /dev/sda1
  • cryptssd0 volume on /dev/sda2 encrypted using a password
  • crypdhdd0 volume on /dev/sdb1 encrypted using a file on root filesystem
  • A VG vg0 with two PVs cryptssd0 and crypthdd0
  • LV lvssd0 on PV cryptssd0
  • LV lvhdd0 on PV crypthdd0 with a cache on PV cryptssd0
  • Root filesystem on LV lvssd0
  • /data on LV lvhdd0

On 2.02.186-1 I get the following errors after entering the password for cryptssd0:

[ TIME ] Timed out waiting for device /dev/disk/by-uuid/[lvhdd0 UUID].
[DEPEND] Dependency failed for /data.
[DEPEND] Dependency failed for Local File Systems.
[...]
You are in emergency mode. After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" or "exit"
to boot into default mode.
Give root password for maintenance
(or press Control-D to continue):

On 2.02.186-2:

ERROR: device '/dev/mapper/vg0-lvssd0' not found. Skipping fsck.
mount: /new_root: no filesystem type specified.
You are now being dropped into an emergency shell.
sh: can't access tty: job control turned off
[rootfs ]#

@oniko
Copy link
Contributor

oniko commented Sep 19, 2019

* during the boot, only one pv is decrypted : cryptdisk1 (it is normal, it used to decrypt the second one when I access the volume thanks to sd-encrypt if I am right)

I think proper solution is to add also second rd.luks... parameter on kernel command line to get both LUKS devices unlocked during boot. Or what is the reason not to do it this way? I don't know Archlinux internals good enough so maybe there's some limitation but I don't think this approach (partial activate VG, and add missing PVs later) is good practice. On one hand it could work ok, but in the same time I see many scenarios where this can fail terribly and in worst case also corrupt your filesystem beyond use.

@anayrat
Copy link

anayrat commented Oct 6, 2019

* during the boot, only one pv is decrypted : cryptdisk1 (it is normal, it used to decrypt the second one when I access the volume thanks to sd-encrypt if I am right)

I think proper solution is to add also second rd.luks... parameter on kernel command line to get both LUKS devices unlocked during boot. Or what is the reason not to do it this way? I don't know Archlinux internals good enough so maybe there's some limitation but I don't think this approach (partial activate VG, and add missing PVs later) is good practice. On one hand it could work ok, but in the same time I see many scenarios where this can fail terribly and in worst case also corrupt your filesystem beyond use.

Sorry for the late answer. You are right, my issue was that I only mentioned the first device in kernel command. I added another parameters for the cryptdisk2 and it solves my issue.
Thanks!

@zkabelac
Copy link

Looks like stale issue - we hade many major releases from that time - recently we also quite improved pscan logic for parsing.
So if there is still some problem - please open new issue for 'upstream' release of lvm2 - there is not much hope to fix 3 years old branching of lvm2 code base.

@jowilkes
Copy link

The issue seems to have disappeared in one of the previous releases.
Cannot say for certain, though, since I also have a number of previous workarounds in place (maybe the issue is not solved, just shifted back).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests