Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'vgchange -ay' freezes after reboot if we have move_pv #136

Closed
lignumqt opened this issue Dec 19, 2023 · 4 comments
Closed

'vgchange -ay' freezes after reboot if we have move_pv #136

lignumqt opened this issue Dec 19, 2023 · 4 comments

Comments

@lignumqt
Copy link

lignumqt commented Dec 19, 2023

Distribution Name - Ubuntu
Distribution Version - 20.04
Kernel Version - 6.1.45, 6.1.68
Architecture - x86_64
lvm version - v2_03_11, v2_03_22

Hello!
I caught such a problem that if I run pvmove -b, then after reboot it freezes when activating volumes vgchange -ay. I do not see such a problem on kernel versions 4.19 and 6.6.

I attached logs from strace -r -f vgchange -ay <vg_name> and vgchange -vvvv -ay <vg_name> from different kernels

strace-4.19.107.log
strace-6.1.log
vgchange-4.19.107.log
vgchange-6.1.log

unfortunately I can't use 6.6 version of the kernel right now.
if you need anything else just tell me.

@lignumqt
Copy link
Author

last 10 lines from the files where the hang occurred

strace-6.1.log:

 0.000045 ioctl(3, DM_TABLE_LOAD, {version=4.0.0, data_size=2048, data_start=312, dev=makedev(0xfd, 0x5), target_count=1, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_SKIP_BDGET_FLAG, ...} => {version=4.47.0, data_size=305, data_start=312, dev=makedev(0xfd, 0x5), name="test-vol_rmeta_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFKMHcDj5mzcREvvohxihSxP6VBSxaCHz8", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_INACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG}) = 0
     0.000334 ioctl(3, DM_DEV_SUSPEND, {version=4.0.0, data_size=2048, dev=makedev(0xfd, 0x5), event_nr=5111808, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_SKIP_BDGET_FLAG} => {version=4.47.0, data_size=305, dev=makedev(0xfd, 0x5), name="test-vol_rmeta_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFKMHcDj5mzcREvvohxihSxP6VBSxaCHz8", target_count=1, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_ACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG|DM_UEVENT_GENERATED_FLAG}) = 0
     0.000063 ioctl(3, DM_DEV_CREATE, {version=4.0.0, data_size=16384, name="test-vol_rimage_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbI5AcvMtBoESzdLnPfAg3LwDLJldmRIu", flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG} => {version=4.47.0, data_size=305, dev=makedev(0xfd, 0x6), name="test-vol_rimage_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbI5AcvMtBoESzdLnPfAg3LwDLJldmRIu", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG}) = 0
     0.000205 ioctl(3, DM_TABLE_STATUS, {version=4.0.0, data_size=16384, data_start=312, dev=makedev(0xfd, 0x6), flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_STATUS_TABLE_FLAG} => {version=4.47.0, data_size=305, data_start=312, dev=makedev(0xfd, 0x6), name="test-vol_rimage_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbI5AcvMtBoESzdLnPfAg3LwDLJldmRIu", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_STATUS_TABLE_FLAG}) = 0
     0.000053 ioctl(3, DM_TABLE_LOAD, {version=4.0.0, data_size=2048, data_start=312, dev=makedev(0xfd, 0x6), target_count=1, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_SKIP_BDGET_FLAG, ...} => {version=4.47.0, data_size=305, data_start=312, dev=makedev(0xfd, 0x6), name="test-vol_rimage_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbI5AcvMtBoESzdLnPfAg3LwDLJldmRIu", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_INACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG}) = 0
     0.000298 ioctl(3, DM_DEV_SUSPEND, {version=4.0.0, data_size=2048, dev=makedev(0xfd, 0x6), event_nr=5111808, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_SKIP_BDGET_FLAG} => {version=4.47.0, data_size=305, dev=makedev(0xfd, 0x6), name="test-vol_rimage_2", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbI5AcvMtBoESzdLnPfAg3LwDLJldmRIu", target_count=1, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_ACTIVE_PRESENT_FLAG|DM_SKIP_BDGET_FLAG|DM_UEVENT_GENERATED_FLAG}) = 0
     0.000092 ioctl(3, DM_DEV_CREATE, {version=4.0.0, data_size=16384, name="test-vol", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbhcI9JYbMTUpIFQjmAIdHRwtESfAwtMX", flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG} => {version=4.47.0, data_size=305, dev=makedev(0xfd, 0x7), name="test-vol", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbhcI9JYbMTUpIFQjmAIdHRwtESfAwtMX", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_SKIP_BDGET_FLAG}) = 0
     0.000291 ioctl(3, DM_LIST_VERSIONS, {version=4.1.0, data_size=2048, data_start=312, flags=DM_EXISTS_FLAG} => {version=4.47.0, data_size=493, data_start=312, flags=DM_EXISTS_FLAG, ...}) = 0
     0.000061 ioctl(3, DM_TABLE_STATUS, {version=4.0.0, data_size=16384, data_start=312, dev=makedev(0xfd, 0x7), flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_STATUS_TABLE_FLAG} => {version=4.47.0, data_size=305, data_start=312, dev=makedev(0xfd, 0x7), name="test-vol", uuid="LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbhcI9JYbMTUpIFQjmAIdHRwtESfAwtMX", target_count=0, open_count=0, event_nr=0, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_STATUS_TABLE_FLAG}) = 0
     0.000056 ioctl(3, DM_TABLE_LOAD, {version=4.0.0, data_size=2048, data_start=312, dev=makedev(0xfd, 0x7), target_count=1, flags=DM_EXISTS_FLAG|DM_PERSISTENT_DEV_FLAG|DM_SKIP_BDGET_FLAG, ...}

vgchange-6.1.log:

09:48:49.060253 vgchange[10425] device_mapper/libdm-common.c:1496        test-vol_rimage_2: Stacking NODE_READ_AHEAD 256 (flags=1)
09:48:49.060263 vgchange[10425] device_mapper/libdm-deptree.c:2211    Creating test-vol
09:48:49.060275 vgchange[10425] device_mapper/ioctl/libdm-iface.c:2097        dm create test-vol LVM-mA1fD9YJDGzAPDaa94MyIPtue7u6UxNFbhcI9JYbMTUpIFQjmAIdHRwtESfAwtMX [ noopencount flush ]   [16384] (*1)
09:48:49.060523 vgchange[10425] device_mapper/libdm-deptree.c:3231    Loading table for test-vol (253:7).
09:48:49.060540 vgchange[10425] device_mapper/libdm-deptree.c:2495      Getting target version for raid
09:48:49.060550 vgchange[10425] device_mapper/ioctl/libdm-iface.c:2097        dm versions   [ opencount flush ]   [2048] (*1)
09:48:49.060568 vgchange[10425] device_mapper/libdm-deptree.c:2512      Found raid target v1.15.1.
09:48:49.060582 vgchange[10425] device_mapper/libdm-deptree.c:3173        Adding target to (253:7): 0 1258291200 raid raid5_ls 3 128 region_size 4096 3 253:1 253:2 253:3 253:4 253:5 253:6
09:48:49.060593 vgchange[10425] device_mapper/ioctl/libdm-iface.c:2097        dm table   (253:7) [ opencount flush ]   [16384] (*1)
09:48:49.060611 vgchange[10425] device_mapper/ioctl/libdm-iface.c:2097        dm reload   (253:7) [ noopencount flush ]   [2048] (*1)

@zkabelac
Copy link
Contributor

This more or less looks like a problem with dm-raid/mdraid target - that has failed somehow and got blocked. See related thread on LKML where some dm-raid/mdraid kernel bugs are currently being investigated.

@lignumqt
Copy link
Author

This more or less looks like a problem with dm-raid/mdraid target - that has failed somehow and got blocked. See related thread on LKML where some dm-raid/mdraid kernel bugs are currently being investigated.

thanks, are you talking about this https://lkml.org/ ?

@lignumqt
Copy link
Author

lignumqt commented Feb 7, 2024

I've found the patch in kernel that fixes this proplem

From 3eb96946f0be6bf447cbdf219aba22bc42672f92 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Wed, 24 May 2023 08:05:38 +0200
Subject: [PATCH] block: make bio_check_eod work for zero sized devices

Since the dawn of time bio_check_eod has a check for a non-zero size of
the device.  This doesn't really make any sense as we never want to send
I/O to a device that's been set to zero size, or never moved out of that.

I am a bit surprised we haven't caught this for a long time, but the
removal of the extra validation inside of zram caused syzbot to trip
over this issue recently.  I've added a Fixes tag for that commit, but
the issue really goes back way before git history.

Fixes: 9fe95babc742 ("zram: remove valid_io_request")
Reported-by: [email protected]
Signed-off-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

@lignumqt lignumqt closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants