Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lvm fails to activate volume groups on boot if global_filter is set #104

Closed
frukto opened this issue Jan 12, 2023 · 20 comments
Closed

lvm fails to activate volume groups on boot if global_filter is set #104

frukto opened this issue Jan 12, 2023 · 20 comments

Comments

@frukto
Copy link

frukto commented Jan 12, 2023

We have a Debian bookworm with lvm 2.03.16 [1] and the system fails to activate volume groups on boot if a global lvm filter is set.

The lvm device is /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part3 aka /dev/sda3 and we set

global_filter = [ "a|pci-0000:04.*|", "r|.*|" ]

however the volume group is not activated on boot. We did some digging at it seems related to the udev import command in

IMPORT{program}="(LVM_EXEC)/lvm pvscan --cache --listvg --checkcomplete --vgonline --autoactivation event --udevoutput --journal=output $env{DEVNAME}"

Running the import command manually [2] (in a busybox rescue shell) gives LVM_VG_NAME_COMPLETE='test-bookworm-vg' and the rest of the commands in the udev rules actually activate the volume group, exactly as documented in lvmautoactivation(7).

But for some reason this does not happen automatically. Removing the global_filter resolves the issue. Changing the filter to global_filter = [ "a|sd.*|", "r|.*|" ] does not resolve the issue. It looks like the udev rules are never run with $env{DEVNAME} containing something suitable.

[1]

lvm version 
  LVM version:     2.03.16(2) (2022-05-18)
  Library version: 1.02.185 (2022-05-18)
  Driver version:  4.47.0
  Configuration:   ./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/x86_64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/x86_64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline

[2] /sbin/lvm pvscan --cache --listvg --checkcomplete --autoactivation event --udevoutput --journal=output /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part3. This is the original command but with --vgonline removed an some chosen device name.

@teigland
Copy link
Contributor

udev-created symlinks in the filter is a special case for pvscan, and it makes an imperfect attempt to identify when the filter contains symlink patterns. One name pattern it looks for to recognize symlinks is "/dev/disk/", so I expect it would work if you changed the filter entry to: "a|/dev/disk/by-path/pci-0000:04.*|"

@frukto
Copy link
Author

frukto commented Jan 13, 2023

Yes after some more digging, I saw that pvscan is called by udev with real device names only, not with symlinks. This renders filter against symlink names useless, but as you said there is some mechanism that should associate all the symlinks with the devices if the filter uses absolute path names.

However, if I set global_filter = [ "a|/dev/disk/by-path/pci-0000:04.*|", "r|.*|" ] it still does not work. lvm does not activate the volume group. The system drops me into a busybox shell and the symlinks in /dev/disk/by-path are already there. They are created in 60-....rules and lvm is handled in 69-...rules

@teigland
Copy link
Contributor

Yes, that's the mechanism that will cause pvscan to read all symlinks in /dev which will let the filter work. Does the journal contain any info related to the pvscan? Another potential issue we've seen in this area is that /run must be tmpfs, not persistent across boots (or if it is then the distro needs to include some steps to clear everything early in the boot.)
If none of that helps, you could collect debugging from the pvscan run in the rule by enabling debugging in the lvm.conf log section: level = 7 and file = "/tmp/lvmdebug.txt"

@frukto
Copy link
Author

frukto commented Jan 16, 2023

I created lvmdebug_early.txt and it shows pvscan is gathering all the symlinks but apparently some are missing. pvscan finds:

  • /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0
  • /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part1
  • /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part2

but not /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part3 which is /dev/sda3 and contains the vg. Therefore the filter rejects the disk.

On the busybox shell all symlinks are present. I also noted, that on some reboots even fewer symlinks are detected:

  • /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0
  • /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part1

@frukto
Copy link
Author

frukto commented Jan 16, 2023

I though this may be some kind of race condition and did some further digging. Inserting a delay in the udev IMPORT rule which calls pvscan did not help.

Next, I tried put the vg on sda2 instead of sda3. Now I get the same behavior but the scan stops even earlier:

with vg on sda2:

13:03:52.589993 pvscan[473] pvscan.c:948  finding all devices for filter symlinks.
13:03:52.589999 pvscan[473] device/dev-cache.c:1178  Creating list of system devices.
13:03:52.590112 pvscan[473] device/dev-cache.c:736  Found dev 8:0 /dev/block/8:0 - new.
13:03:52.590579 pvscan[473] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-diskseq/1 - new alias.
13:03:52.590626 pvscan[473] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-id/ata-Hitachi_HUA723030ALA640_MK0351YVGR4S0A - new alias.
13:03:52.590641 pvscan[473] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-id/wwn-0x5000cca234ca11f7 - new alias.
13:03:52.590665 pvscan[473] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0 - new alias.
13:03:52.590685 pvscan[473] device/dev-cache.c:1143  /dev/fd: Symbolic link to directory
13:03:52.590717 pvscan[473] device/dev-cache.c:1148  /dev/pts: Different filesystem in directory
13:03:52.590727 pvscan[473] device/dev-cache.c:761  Found dev 8:0 /dev/sda - new alias.
13:03:52.590733 pvscan[473] device/dev-cache.c:736  Found dev 8:1 /dev/sda1 - new.
13:03:52.590739 pvscan[473] device/dev-cache.c:736  Found dev 8:2 /dev/sda2 - new.
13:03:52.590744 pvscan[473] device/dev-cache.c:736  Found dev 8:3 /dev/sda3 - new.
13:03:52.590844 pvscan[473] device/dev-cache.c:727  Found dev 8:2 /dev/sda2 - exists. 
13:03:52.590857 pvscan[473] pvscan.c:1539  pvscan_cache_args: filter devs nodata
13:03:52.590889 pvscan[473] filters/filter-regex.c:196  /dev/sda2: Skipping (regex)

with vg on sda3:

08:31:28.606675 pvscan[489] pvscan.c:948  finding all devices for filter symlinks.
08:31:28.606680 pvscan[489] device/dev-cache.c:1178  Creating list of system devices.
08:31:28.606779 pvscan[489] device/dev-cache.c:736  Found dev 8:0 /dev/block/8:0 - new.
08:31:28.606795 pvscan[489] device/dev-cache.c:736  Found dev 8:1 /dev/block/8:1 - new.
08:31:28.607199 pvscan[489] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-diskseq/1 - new alias.
08:31:28.607230 pvscan[489] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-id/ata-Hitachi_HUA723030ALA640_MK0351YVGR4S0A - new alias.
08:31:28.607246 pvscan[489] device/dev-cache.c:761  Found dev 8:1 /dev/disk/by-id/ata-Hitachi_HUA723030ALA640_MK0351YVGR4S0A-part1 - new alias.
08:31:28.607253 pvscan[489] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-id/wwn-0x5000cca234ca11f7 - new alias.
08:31:28.607266 pvscan[489] device/dev-cache.c:761  Found dev 8:1 /dev/disk/by-id/wwn-0x5000cca234ca11f7-part1 - new alias.
08:31:28.607291 pvscan[489] device/dev-cache.c:761  Found dev 8:1 /dev/disk/by-partuuid/11b04fc3-7c37-4166-8c34-8a0bb539cdec - new alias.
08:31:28.607316 pvscan[489] device/dev-cache.c:761  Found dev 8:0 /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0 - new alias.
08:31:28.607330 pvscan[489] device/dev-cache.c:761  Found dev 8:1 /dev/disk/by-path/pci-0000:04:00.0-sas-phy0-lun-0-part1 - new alias.
08:31:28.607350 pvscan[489] device/dev-cache.c:1143  /dev/fd: Symbolic link to directory
08:31:28.607384 pvscan[489] device/dev-cache.c:1148  /dev/pts: Different filesystem in directory
08:31:28.607394 pvscan[489] device/dev-cache.c:761  Found dev 8:0 /dev/sda - new alias.
08:31:28.607401 pvscan[489] device/dev-cache.c:761  Found dev 8:1 /dev/sda1 - new alias.
08:31:28.607407 pvscan[489] device/dev-cache.c:736  Found dev 8:2 /dev/sda2 - new.
08:31:28.607413 pvscan[489] device/dev-cache.c:736  Found dev 8:3 /dev/sda3 - new.
08:31:28.607516 pvscan[489] device/dev-cache.c:727  Found dev 8:3 /dev/sda3 - exists. 
08:31:28.607523 pvscan[489] pvscan.c:1539  pvscan_cache_args: filter devs nodata
08:31:28.607555 pvscan[489] filters/filter-regex.c:196  /dev/sda3: Skipping (regex)

It looks like the discovery loop stops when the first path pointing back to the current device is found, see _insert_dev

@teigland
Copy link
Contributor

When pvscan detects that symlinks are needed, it first calls dev_cache_scan() which reads everything under /dev and saves all the names it finds in the dev_cache.c structures (including /dev/sda3.) After this, pvscan calls setup_dev_in_dev_cache() which adds the device name in the command arg (e.g. /dev/sda3) to dev_cache.c structures. That device was already found and added by dev_cache_scan(), so you see the "- exists" message.

I suspect what you're seeing is a lack of coordination between udev rules creating symlinks and the udev rule running pvscan. Enabling udev debugging (set udev_log=debug in /etc/udev/udev.conf) would probably show this. I wonder if certain symlinks are consistently ready when pvscan runs (maybe try using the wwn symlink).

Problems like this were one of the reasons that the devices file was added to lvm a couple years ago, which would be the best solution. Set use_devicesfile=1 in lvm.conf, get rid of the filter in lvm.conf, run "lvmdevices --adddev /dev/sda3", and everything should then work fine. lvm keeps track of which devices you want to use based on their wwid in /etc/lvm/devices/system.devices. See the lvmdevices man page for more.

@prajnoha
Copy link
Member

prajnoha commented Jan 17, 2023

I suspect what you're seeing is a lack of coordination between udev rules creating symlinks and the udev rule running pvscan. Enabling udev debugging (set udev_log=debug in /etc/udev/udev.conf) would probably show this. I wonder if certain symlinks are consistently ready when pvscan runs (maybe try using the wwn symlink).

Well, the issue might be the our udev rule itself we use:

IMPORT{program}="/usr/sbin/lvm pvscan --cache --listvg --checkcomplete --vgonline --autoactivation event --udevoutput --journal=output $env{DEVNAME}"

If we are using the IMPORT rule, then it is executed right at the time of the rule evaluation in situ. As such, we can't access anything else than the device node itself (the /dev/<kernel_name>). That is because all the symlinks to this node are created only after all the udev rules are evaluated. To get things executed after symlinks are ready, we'd need to use RUN rule. But that one can't return anything back to further udev rule evaluation, of course, because no other rules are executed anymore at that time.

So we have 2 contradictory requirements here:

  1. we need pvscan to return information back to udev for further processing (exporting the LVM_VG_COMPLETE variable which is looked at right after the pvscan)

  2. we need to see all the symlinks at the time of pvscan (which we can't, because we're in the middle of udev rule processing and subsequent udev rules may add more symlinks for current device node)

The 1) was made to break pvscan in two - the scan itself and then the activation.

The 2) is important to see all the symlinks so we can use filters containing these symlinks.

To fix this, we'd need to make the pvscan to get executed after all udev rules are evaluated (so using RUN rule instead of IMPORT - but then we'd need to deal with how we'd execute the activation as next step based on the result of the pvscan - how to pass the information between the two).

Alternatively, we could also just move the pvscan+activation udev rule from 69-dm-lvm.rules to the end of rule execution (using a name like 99-dm-lvm.rules) and then checking the SYMLINKS variable inside pvscan (which contains all the symlinks that udevd is instructed to create after all the rules are executed) instead of scanning the /dev itself. However, this is not 100% bullet-proof, because someone might still add 99-zzz.rules which are ordered after 99-dm-lvm.rules and which may create further symlinks (but that chance is at least very very minimal).

@frukto
Copy link
Author

frukto commented Jan 17, 2023

I don't know much about udev, but my understanding is something like this:

  1. a new device shows up
  2. all udev rules are applied in order
    1. 60-.. creates the symlinks in /dev/disk
    2. 69-.. calls pvscan

So in this particular case the /dev/disk/ symlinks should be already present, unless there are races in udev. Maybe it is necessary to insert some kind of udevadm settle before starting the pvscan.

@prajnoha But in principle you are right, there might be other user defined udev rules which create more symlinks. But this might be rather a documentation issue.

@prajnoha
Copy link
Member

  1. a new device shows up

  2. all udev rules are applied in order

    1. 60-.. creates the symlinks in /dev/disk
    2. 69-.. calls pvscan

So in this particular case the /dev/disk/ symlinks should be already present, unless there are races in udev. Maybe it is necessary to insert some kind of udevadm settle before starting the pvscan.

...unfortunately, udev doesn't create symlinks right away. The symlink rules are queued up to the point all the udev rules are processed and only after that all the symlinks are created code

The only rule which can see the symlinks is the RUN rule (because all the RUN rules are queue and then executed after the symlink handling code

But yes, that makes sense... because further rules may overwrite previous rules, including symlinks and so we need to evaluate all the rules first.

@prajnoha But in principle you are right, there might be other user defined udev rules which create more symlinks. But this might be rather a documentation issue.

Sure, I don't think we'd ever hit the issue and even if yes, it's very easy to check and see. Actually, I'm starting to like the idea of taking the symlink list directly from the udev's DEVLINKS variable instead of scanning the /dev. But it depends on whether we get anything more than just device names (symlinks) from that scan. We should be scanning only the single device in pvscan (for which the basic devnode is enough), the symlinks would be read just for the filter matching.

@prajnoha
Copy link
Member

So in this particular case the /dev/disk/ symlinks should be already present, unless there are races in udev. Maybe it is necessary to insert some kind of udevadm settle before starting the pvscan.

(That would hang - we can't call anything like that from inside udev rules. The settle waits for all udev processing to finish and if we placed that settle inside a udev rule, it would be waiting for itself...)

@teigland
Copy link
Contributor

I'm reluctant to jump through too many hoops to make filter symlinks work since we have the real, long term solution available (system.devices); there should be no reason not to use it. That said, I'm also curious about possible ways to make filter symlinks work if we can find a non-disruptive way to do it (i.e. that doesn't penalize us in some other way.) One idea is if you use /dev/disk/by-id/wwn- in the filter, then we could probably use lvm's knowledge of wwids, and match the value after "wwn-" against the wwid of the pvscan device arg. This would be a special case enabled only when pvscan sees "wwn-" in the filter, and it would not be the normal use of the filter (other commands would continue using the filter normally, it's just pvscan commands run from the udev rule that would have this special workaround.)

@prajnoha
Copy link
Member

That said, I'm also curious about possible ways to make filter symlinks work if we can find a non-disruptive way to do it (i.e. that doesn't penalize us in some other way.)

Exactly, to me, checking the DEVLINKS variable seems the easiest way to get symlinks info - there's no other scanning, besides scanning the contents of the device itself for LVM metadata, of course. There's no further library calls, no /dev scanning, just reading the environment variable which is always passed to the executed program in udev anyway.

This is an example how DEVLINKS variable looks like:

DEVLINKS="/dev/disk/by-id/nvme-nvme.8086-6e766d652d31-51454d55204e564d65204374726c-00000001-part1 /dev/disk/by-id/lvm-pv-uuid-wAPABH-fXxL-Bxdy-pja6-PgSb-zUkH-0EAL3j /dev/disk/by-path/pci-0000:00:02.0-nvme-1-part1 /dev/disk/by-partuuid/1f6cd6d7-97a8-864e-9e1e-e47095436465 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_nvme-1-part1"

So I think we'd have all the info directly accessible for matching with filters. Also, the DEVLINKS variable is important for udev, so it must be always set on each event so it always contains full list of symlinks (...if not, udevd would remove any existing symlinks not mentioned in the DEVLINKS variable for current event).

We'd just need to move the pvscan rule to be executed later (giving it a name like 99-...rules) and reading the symlinks from that env. variable instead of scanning the dev ourselves.

Renaming the rule file and moving it for later execution shouldn't matter, because no other udev rules than our own consume the info returned from pvscan. It just seems the most straightforward way to me, far better than scanning the /dev ourselves, I think.

Just a proposal, I'm not saying there's not any issue I haven't noticed yet, but it's something we could consider at least...

@teigland
Copy link
Contributor

Yes, that sounds pretty good, I'll give it a try.

@teigland
Copy link
Contributor

Here's a devel branch with some patches that work for me
https://sourceware.org/git/?p=lvm2.git;a=shortlog;h=refs/heads/dev-dct-pvscan-devlinks-2

I haven't tried moving 69-dm-lvm.rules yet. Is that move (e.g. to 99-) needed so that DEVLINKS will contain all the necessary names?

@prajnoha
Copy link
Member

Yes, I'd move/rename those 69-dm-lvm.rules so they're executed later, just in case some other rules add more DEVLINKS (including custom rules added by users directly, which they freely can by placing it in /etc/udev/rules.d).

@frukto
Copy link
Author

frukto commented Jan 30, 2023

@teigland: I just built your branch and it solves the problem for us (w/o moving 69-dm-lvm.rules). Thanks for your work.

Probably the man page should also mention that it needs absolute paths in the filter expression to trigger the symlink logic - or did you change that as well?

Also the approach with devices files works for us. We will go that way at least until your changes landed in Debian. How ever, I have the feeling, that the devices file is a bit more obscure than device filtering.

@teigland
Copy link
Contributor

Yes, I'd move/rename those 69-dm-lvm.rules so they're executed later, just in case some other rules add more DEVLINKS (including custom rules added by users directly, which they freely can by placing it in /etc/udev/rules.d).

OK, it sounds like renaming the rule will add extra protection for custom rules. At the same time, there could be custom rules that rely on lvm starting at 69. How about addressing these special cases via documentation? In a man page we'd explain that only specific links (wwn-, pci-, etc) are known to work in the lvm filter, and if a user wants to add a udev rule with custom links, then they need to move the lvm rule to a number larger than their own if they want to use their links in the filter.

@teigland
Copy link
Contributor

@teigland: I just built your branch and it solves the problem for us (w/o moving 69-dm-lvm.rules). Thanks for your work.
Probably the man page should also mention that it needs absolute paths in the filter expression to trigger the symlink logic - or did you change that as well?

Thanks for verifying. Yes, we'll want to include something in a man page about this issue. The last patch added a recognition of "pci-" in filter entries so that you don't need to include the /dev/disk prefix for lvm to know they are udev symlinks.

@prajnoha
Copy link
Member

OK, it sounds like renaming the rule will add extra protection for custom rules. At the same time, there could be custom rules that rely on lvm starting at 69. How about addressing these special cases via documentation? In a man page we'd explain that only specific links (wwn-, pci-, etc) are known to work in the lvm filter, and if a user wants to add a udev rule with custom links, then they need to move the lvm rule to a number larger than their own if they want to use their links in the filter.

OK, we can do it that way too. Though, I think, we don't produce anything in that rule that would be consumed by other rules, besides the SYSTEMD_READY variable which is read by systemd itself (and maybe 99-systemd.rules).

The LVM rule (69-...) is provided in the lvm2 package and placed in /lib/udev/rules.d - as such, users can't move that (...well, they can, but on package upgrades, this would cause issues). So we should just mention that any custom rule, which adds more symlinks (and which are used in the lvm filter), should be placed before the 69-dm-lvm.rules.

If this, by any chance, would cause a problem, we could still move the rule ourselves then...

@teigland
Copy link
Contributor

teigland commented Feb 1, 2023

@teigland teigland closed this as completed Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants