Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out zero-sized devices by looking at "size" attribute in sysfs to avoid "no medium found" errors #13

Closed
miquecg opened this issue Feb 6, 2019 · 7 comments

Comments

@miquecg
Copy link

miquecg commented Feb 6, 2019

Not sure if this is an issue or an expected behaviour, so let's ask.

➜ # ~  pvscan --cache
  /dev/sdd: open failed
  /dev/sde: open failed
  /dev/sdf: open failed
  /dev/sdg: open failed
➜ # ~  pvscan
  /dev/sdd: open failed
  /dev/sde: open failed
  /dev/sdf: open failed
  /dev/sdg: open failed
  PV /dev/mapper/ssd         VG ssd             lvm2 [<118,74 GiB / <108,74 GiB free]
  PV /dev/sdb1               VG storage         lvm2 [<931,51 GiB / 0    free]
  PV /dev/mapper/crypt_hdd   VG hdd             lvm2 [<465,76 GiB / <73,76 GiB free]
  Total: 3 [1,48 TiB] / in use: 3 [1,48 TiB] / in no VG: 0 [0   ]

If I run pvscan without --cache and use_lvmetad option is enabled in configuration, I'd expect second command from the example to not scan devices again, which looks like it's happening.

Also, but not strictly related, is it normal that [email protected] unit in systemd runs without cache at boot? For some reason I started to see ... open failed messages during boot process.

This is for lvm2 2.02.183 version on Arch Linux.

Thanks!

@teigland
Copy link
Contributor

teigland commented Feb 6, 2019

The errors are produced when device nodes exist for devices that don't exist on the system. It's udev's job to create and remove device nodes at the right time, so maybe there's something odd happening with udev.

To check if a command is actually reading a device you would need to run it with -vvvv and look at the debugging. (The open error could occur when lvm tries to open the device to check its size, for example, not necessarily to read it.)

The lvm2-pvscan service should be running pvscan --cache

@prajnoha
Copy link
Member

prajnoha commented Feb 7, 2019

If I run pvscan without --cache and use_lvmetad option is enabled in configuration, I'd expect second command from the example to not scan devices again, which looks like it's happening.

The open here probably comes from the device filters inside LVM. If using lvmetad, we shouldn't need to execute certain filters anymore: devices which are already cached in lvmetad passed these filters, but I need to check the code and recall... It's possible something got changed there recently or there's a filter we still need to execute for some reason (...and hence open the device to scan for certain signatures on disk to apply the filter based on that).

Also, but not strictly related, is it normal that [email protected] unit in systemd runs without cache at boot? For some reason I started to see ... open failed messages during boot process.

Normally, the pvscan --cache <major>:<minor> scans only the <major>:<minor> device for LVM metadata. However, if this is the very first LVM command that sees lvmetad empty, such command initiates full scan so it can fill up lvmetad with any metadata from devices which had already been present on the system before lvmetad actually started (that happens at boot usually or after lvmetad has been restarted). And usually, the very first LVM command to see lvmetad empty is the pvscan --cache <major>:<minor> command executed from within lvm2-pvscan@<major>:<minor>.service.

@prajnoha
Copy link
Member

prajnoha commented Feb 7, 2019

The errors are produced when device nodes exist for devices that don't exist on the system. It's udev's job to create and remove device nodes at the right time, so maybe there's something odd happening with udev.

Actually, if it's /dev/<kernel_device_name> and /dev/sd* are kernel device names, the nodes are created directly by kernel through devtmpfs - so the node is created under /dev as soon as there's kernel representation (and vice versa when the device is removed from kernel). So the failure to open the device must have a different reason... I'm thinking of permissions, but I think that would be logged directly that it's because of perms, so some other problem with the device probably.

Is there anything logged in dmesg/systemd journal about those devices?

@prajnoha
Copy link
Member

prajnoha commented Feb 8, 2019

If I run pvscan without --cache and use_lvmetad option is enabled in configuration, I'd expect second command from the example to not scan devices again, which looks like it's happening.

The open here probably comes from the device filters inside LVM. If using lvmetad, we shouldn't need to execute certain filters anymore: devices which are already cached in lvmetad passed these filters, but I need to check the code and recall...

...so, yes, these open calls come from device filtering inside LVM and it's expected here for pvscan (as well as pvs -a and probably others where we need to process all devices).

There's some more info about this in this commit:
e0ce728

Back to the report:

  • the open calls are expected here for pvscan even after we filled lvmetad with results from previous pvscan --cache call

  • the failed open calls - let's see that dmesg/systemd journal whether it has logged any problem with that device (so this is then probably not an LVM issue, but issue with disk access and availability)

@miquecg
Copy link
Author

miquecg commented Feb 9, 2019

  • the failed open calls - let's see that dmesg/systemd journal whether it has logged any problem with that device (so this is then probably not an LVM issue, but issue with disk access and availability)

You're right @prajnoha.

I've run pvscan -vvvv as @teigland suggested and I've found these lines:

#device/dev-cache.c:1212          Creating list of system devices.
#device/dev-cache.c:723           Found dev 8:48 /dev/sdd - new.
#device/dev-cache.c:763           Found dev 8:48 /dev/disk/by-id/usb-Generic-_Compact_Flash_20070818000000000-0:0 - new alias.
#device/dev-cache.c:763           Found dev 8:48 /dev/disk/by-path/pci-0000:00:1a.0-usb-0:1.3:1.1-scsi-0:0:0:0 - new alias.
#device/dev-cache.c:723           Found dev 8:64 /dev/sde - new.
#device/dev-cache.c:763           Found dev 8:64 /dev/disk/by-id/usb-Generic-_SM_xD-Picture_20070818000000000-0:1 - new alias.
#device/dev-cache.c:763           Found dev 8:64 /dev/disk/by-path/pci-0000:00:1a.0-usb-0:1.3:1.1-scsi-0:0:0:1 - new alias.
#device/dev-cache.c:723           Found dev 8:80 /dev/sdf - new.
#device/dev-cache.c:763           Found dev 8:80 /dev/disk/by-id/usb-Generic-_SD_MMC_20070818000000000-0:2 - new alias.
#device/dev-cache.c:763           Found dev 8:80 /dev/disk/by-path/pci-0000:00:1a.0-usb-0:1.3:1.1-scsi-0:0:0:2 - new alias.
#device/dev-cache.c:723           Found dev 8:96 /dev/sdg - new.
#device/dev-cache.c:763           Found dev 8:96 /dev/disk/by-id/usb-Generic-_MS_MS-Pro_20070818000000000-0:3 - new alias.
#device/dev-cache.c:763           Found dev 8:96 /dev/disk/by-path/pci-0000:00:1a.0-usb-0:1.3:1.1-scsi-0:0:0:3 - new alias.

This is caused by a multicard reader I have plugged in. If I try to read from any empty slot I do have errors media not found.

Do you want me to share the complete output of the command for completeness?

@prajnoha
Copy link
Member

Yes, that's it - the card... If you remove the card, there's still a device representation present on the system and we end up with "no media found" error (...the same would happen for CD/DVD/BD drives after removing the media). So this is all expected then.

I got a bit mislead by the error message that was without the reason given:

   /dev/sdd: open failed

I'd expect that to be:

  /dev/sdd: open failed: No medium found

Anyway, we know now...

We do have an internal filter to filter out devices which do not have enough size to hold a PV. Thing is, that for this filter to work, we need to open the device and run BLKGETSIZE64 ioctl. Of course, if that device is not backed by any media, we can't open that at all - so that's the error reported then.

We could possibly improve this by looking at size attribute in sysffs first, e.g. on my system where sda is the removed card:

  # cat /sys/block/sdd/size
  0

And we could make this a part of the usable internal LVM filter, relevant parts of the code are here (...actually, we already do this when external_device_info_source="udev" is set in lvm.conf):

static int _passes_usable_filter(struct cmd_context *cmd, struct dev_filter *f, struct device *dev)
{
struct filter_data *data = f->private;
filter_mode_t mode = data->mode;
int skip_lvs = data->skip_lvs;
struct dev_usable_check_params ucp = {0};
int r = 1;
/* further checks are done on dm devices only */
if (dm_is_dm_major(MAJOR(dev->dev))) {
switch (mode) {
case FILTER_MODE_NO_LVMETAD:
ucp.check_empty = 1;
ucp.check_blocked = 1;
ucp.check_suspended = ignore_suspended_devices();
ucp.check_error_target = 1;
ucp.check_reserved = 1;
ucp.check_lv = skip_lvs;
break;
case FILTER_MODE_PRE_LVMETAD:
ucp.check_empty = 1;
ucp.check_blocked = 1;
ucp.check_suspended = 0;
ucp.check_error_target = 1;
ucp.check_reserved = 1;
ucp.check_lv = skip_lvs;
break;
case FILTER_MODE_POST_LVMETAD:
ucp.check_empty = 0;
ucp.check_blocked = 1;
ucp.check_suspended = ignore_suspended_devices();
ucp.check_error_target = 0;
ucp.check_reserved = 0;
ucp.check_lv = skip_lvs;
break;
}
if (!(r = device_is_usable(dev, ucp)))
log_debug_devs("%s: Skipping unusable device.", dev_name(dev));
}
if (r) {
/* check if the device is not too small to hold a PV */
switch (mode) {
case FILTER_MODE_NO_LVMETAD:
/* fall through */
case FILTER_MODE_PRE_LVMETAD:
r = _check_pv_min_size(dev);
break;
case FILTER_MODE_POST_LVMETAD:
/* nothing to do here */
break;
}
}
return r;
}

static int _check_pv_min_size(struct device *dev)
{
if (dev->ext.src == DEV_EXT_NONE)
return _native_check_pv_min_size(dev);
if (dev->ext.src == DEV_EXT_UDEV)
return _udev_check_pv_min_size(dev);
log_error(INTERNAL_ERROR "Missing hook for PV min size check "
"using external device info source %s", dev_ext_name(dev));
return 0;
}

I'll think about adding this...

@prajnoha prajnoha changed the title pvscan not reading lvmetad cache ?? filter out zero-sized devices by looking at "size" attribute in sysfs to avoid "no medium found" errors Feb 11, 2019
@miquecg
Copy link
Author

miquecg commented Feb 11, 2019

Cool. Many thanks for helping me to understand this. It's not big deal but a bit misleading because I totally forgot about that card reader hehe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants