Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysstat should use stable disk identifiers in stats_disk #195

Open
msekletar opened this issue Nov 20, 2018 · 19 comments
Open

sysstat should use stable disk identifiers in stats_disk #195

msekletar opened this issue Nov 20, 2018 · 19 comments

Comments

@msekletar
Copy link
Contributor

Currently, the disk activity statistics use major:minor to associate disk activity with the block device. There are two major drawbacks with this approach:

  • major:minor numbers are not consistent across reboots and sar may associate given disk activity with the different block device after reboot

  • major:minor are resolved locally and hence portability of the files is problematic. IOW, displaying them with sar on a different machine is close to pointless

I am opening this to start the discussion. Unfortunately, I am not quite sure what other identifiers we could use.

@msekletar msekletar changed the title sysstat should use stable disk identifiers in stats_disk sysstat should use stable disk identifiers in stats_disk Nov 20, 2018
@hunter86bg
Copy link

I just experienced that and now we cannot identify the disks before and after the reboot.
According to me , the best approach is to use the "wwid" , but that is not valid for VMs (in VM environment it should be explicitly enabled).

So , maybe sar should try to use the wwid and fallback to major:minor only if there is no info for the wwid (for example VmWare machines without 'disk.enableUUID = "TRUE" ).

@sysstat
Copy link
Owner

sysstat commented Mar 9, 2019

So this is probably what I will use in a future sysstat version. I will investigate the point when I have a bit more time.
Thanks.

@hunter86bg
Copy link

It will require some planning.
It seems that LVs and MD (software raid) can be defined like 'VG/LV' & 'mdX' respectively as they are formed from the same members (PVs, MD raid members).
For multipath device like '/dev/mapper/3600xxxxxxxxxa0' a new naming convention will be needed. Maybe something like 3600xxxxxxxxxa0-mpath, as multipath friendly names are also not persistent unless defined by wwid & alias.
For multipath block device members we will still have the issue, but we got multipath as aggregation layer (depends on roound-robin, multibus , etc).So for mpath members something like 3600xxxxxxxxxa0-major:minor is still sufficient.
Of course if a block device has no wwid (Virtualization not configured for that) , then we can still keep the dev-major:minor .

@lberra
Copy link

lberra commented Aug 23, 2019

I believe you could simply leave the job to udev
currently sar -d -j PATH reports stable identifiers, the same logic could be replicated to sadc

@hunter86bg
Copy link

hunter86bg commented Aug 23, 2019

This doesn't work... You got your major:minor in the sar mapped to current udev devices, but after rebooting - sda before is not equal to sda now and 253:8 is not equal to 253:8 after a reboot.

@lberra
Copy link

lberra commented Aug 23, 2019

I will try to rephrase
I am well aware that sadc stores major:minor number of current devices in the data file and that 'currently' name resolution is done at runtime.

What i was proposing is to use the same logic that is used at runtime by sar into sadc, so instead, or in addition of storing major:minor a string will be stored.

IMHO the advantage of using this approach instead of creating a different logic as you seem to propose above is

  • most of the code is already there: get_persistent_name_from_pretty() in common.c
  • the names are managed with udev, the naming scheme can be configured by the system administrator if so desired, and there is no need to plan for every single device type in sysstat

@sysstat
Copy link
Owner

sysstat commented Aug 24, 2019

So it seems that using sar -d -j XXX meets our need and could be our solution to get a stable identifier across reboots.
BTW we should rather use sar -d -j UUID than sar -d -j PATH as PATH can change for a given disk: If you plug an external USB drive into a port then into another one, you will get two different paths for that same disk whereas its UUID won't change. The drawback with UUID is that it doesn't exist for whole devices (e.g. sda), only for partitions (sda1, sda2...) What should be used instead for whole devices then?

Now do we need sadc to actually save the UUID in the binary datafile? Yes if files are to be displayed with sar on another machine. Yet is it worth the cost? Saving UUID would be rather expensive as it would need to be done for every device and every collected sample... What do you think?

@hunter86bg
Copy link

I agree with your comment, sar -d -j UUID is better approach, but do we have a logic to get wwid ? WWID is supposed to be unique everywhere and for me is best approach.

About binary file containing uuid/wwid - we don't need to probe it every time - maybe sar can listen for udev changes and then updatr it's own cache file ?

@sysstat
Copy link
Owner

sysstat commented Aug 24, 2019

Using WWID is OK if I can retrieve it easily for any kind of devices, which is not the case it appears (well, maybe I'm missing something...)
Here is a sample output from /dev/disk/by-id on my machine with an internal drive (sda), an external one (sdf) and an USB key (sdg):

lrwxrwxrwx. 1 root root  9 Aug 24 14:12 ata-Hitachi_HDS723020BLA642_MN1240F33J1XND -> ../../sda
lrwxrwxrwx. 1 root root  9 Aug 24 14:20 ata-ST9500325AS_5VERM441 -> ../../sdf
lrwxrwxrwx. 1 root root  9 Aug 24 14:21 usb-_USB_DISK_3.0_07072A310A212C69-0:0 -> ../../sdg
lrwxrwxrwx. 1 root root  9 Aug 24 14:20 wwn-0x5000c500495366d9 -> ../../sdf
lrwxrwxrwx. 1 root root  9 Aug 24 14:12 wwn-0x5000cca369f193ac -> ../../sda

I need a clear and reliable way to get the WWID from such an output.

Wrt binary file containing UUID/WWID, this is not a question of probing them every time, but really of space used to save them for every sample every time. I was just wondering whether using 16 bytes for every device on every sample to save the UUID/WWID was worth it, knowing that a stable identifier can already be displayed with sar -d -j XXX if you stay on the same machine... But space is probably no longer a problem nowadays :-)

@lberra
Copy link

lberra commented Aug 24, 2019

BTW we should rather use sar -d -j UUID than sar -d -j PATH as PATH can change for a given disk: If you plug an external USB drive into a port then into another one, you will get two different paths for that same disk whereas its UUID won't change. The drawback with UUID is that it doesn't exist for whole devices (e.g. sda), only for partitions (sda1, sda2...) What should be used instead for whole devices then?

The reason i was making an example with PATH was in fact this, besides UUID makes sense only if with sadc -S XDISK, and it never occurred to me to gather statistics on USB drives :)
But that was only an example, the current -j XXXX implementation has the advantage of letting the system administrator choose.

Now do we need sadc to actually save the UUID in the binary datafile?
Yes if files are to be displayed with sar on another machine.

The issue me and @hunter86bg are facing is not displaying sar files on another machine, but get consistent device names across reboot. on a server system with many devices and a long uptime disks can be added or removed, besides device-mapper and md will always use dynamic major/minor, so just adding a logical volume with lvm will result in device name changing after a reboot.

Yet is it worth the cost? Saving UUID would be rather expensive as it would need to be done for every device and every collected sample... What do you think?

I'd take the cost, but i understand people with different workloads might not, so i would just add the -j option to sadc and let the sysadm choose by modifying the cron job or systemd unit.
I also understand that in order to do this you would need to change the data file format adding a text string to store the text identifier, and also store the -j setting in the file, to handle appending.

@hunter86bg
Copy link

Well, what about this way.
Sar uses current major:minor , but also save a map of -> major:minor to wwid/uuid in another file and update it once a new device is added/old device removed.
That's way after a reboot , sar can read the old binary data + map file and present data based on wwid/uuid and not cureently assigned major:minor values.

@sysstat
Copy link
Owner

sysstat commented Aug 25, 2019

OK, thanks for all those ideas. Two things are important regarding the binary datafile though: It should be self-sufficient, and its format must be compatible with older sar versions.

@msekletar
Copy link
Contributor Author

Btw, I don't think that all block device types have WWIDs. IIRC loop devices don't have them. For those, we could possibly store information about the backing file (e.g. inode number, parent block device, path) which was attached to the device at the time when the data was gathered.

@lberra
Copy link

lberra commented Aug 26, 2019

I am still not convinced wwid is the way to go,
how do you handle multipath device, where all devices have the same wwid?

As food for tought here is the content of /dev/disk/by-{id,path} from two different servers:
The first one has multipath devices coming from a netapp via iSCSI, and the multipath device name have been customized as oraXXX
The second has multipath devices from a 3par via fiber channel and multipath device names are default mpathX
(i just anonymized part of the output, but it should be clear)

ISCSI /dev/disk/by-id:
lrwxrwxrwx. 1 root root 10 Aug 26 16:00 dm-name-oraarch01 -> ../../dm-7
lrwxrwxrwx. 1 root root 10 Aug 26 16:00 dm-name-oradata01 -> ../../dm-5
lrwxrwxrwx. 1 root root 10 Aug 26 15:55 dm-name-oradata02 -> ../../dm-8
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-name-vg00-lv04 -> ../../dm-3
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 dm-name-vg00-lv05 -> ../../dm-4
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-name-vg00-lv01 -> ../../dm-0
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-name-vg00-lv02 -> ../../dm-1
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-name-vg00-lv03 -> ../../dm-2
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 dm-uuid-LVM-nHsu7kq86GOxxuBd8n5euedF27TsdhK41E7AlGyH0sz4V5xL0eCPFZQBTJlEscs5 -> ../../dm-4
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-uuid-LVM-nHsu7kq86GOxxuBd8n5euedF27TsdhK4eEK151Sq6oaYwPofniaNj5wCpD77jbk8 -> ../../dm-0
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-uuid-LVM-nHsu7kq86GOxxuBd8n5euedF27TsdhK4FQcY27xasGjmUyfhYd55LwHjb2XrGNME -> ../../dm-3
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-uuid-LVM-nHsu7kq86GOxxuBd8n5euedF27TsdhK4Wy8cbwDA2JjsfB8wFB9rSLVCvmJm0Il2 -> ../../dm-1
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 dm-uuid-LVM-nHsu7kq86GOxxuBd8n5euedF27TsdhK4zX1APvBzlbmnCA6m4TozyEGSB0rVwrfm -> ../../dm-2
lrwxrwxrwx. 1 root root 10 Aug 26 16:00 dm-uuid-mpath-3600a0980XXXXXXXXXXXXXXXXXXXXXXXX -> ../../dm-5
lrwxrwxrwx. 1 root root 10 Aug 26 16:00 dm-uuid-mpath-3600a0980YYYYYYYYYYYYYYYYYYYYYYYY -> ../../dm-7
lrwxrwxrwx. 1 root root 10 Aug 26 15:55 dm-uuid-mpath-3600a0980ZZZZZZZZZZZZZZZZZZZZZZZZ -> ../../dm-8
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 lvm-pv-uuid-mTCtqa-BWOV-n6zT-Ebgu-dNId-2N3Y-0psUe8 -> ../../sda2
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 scsi-3600605b00a2bdf00242b28c10dcb1999 -> ../../sda
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 scsi-3600605b00a2bdf00242b28c10dcb1999-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 scsi-3600605b00a2bdf00242b28c10dcb1999-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 scsi-3600a0980XXXXXXXXXXXXXXXXXXXXXXXX -> ../../sdf
lrwxrwxrwx. 1 root root 9 May 27 23:19 scsi-3600a0980YYYYYYYYYYYYYYYYYYYYYYYY -> ../../sdd
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 scsi-3600a0980ZZZZZZZZZZZZZZZZZZZZZZZZ -> ../../sdg
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 wwn-0x600605b00a2bdf00242b28c10dcb1999 -> ../../sda
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 wwn-0x600605b00a2bdf00242b28c10dcb1999-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 wwn-0x600605b00a2bdf00242b28c10dcb1999-part2 -> ../../sda2
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 wwn-0x600a0980XXXXXXXXXXXXXXXXXXXXXXXX -> ../../sdf
lrwxrwxrwx. 1 root root 9 May 27 23:19 wwn-0x600a0980YYYYYYYYYYYYYYYYYYYYYYYY -> ../../sdd
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 wwn-0x600a0980ZZZZZZZZZZZZZZZZZZZZZZZZ -> ../../sdg

ISCSI /dev/disk/by-path:
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 ip-AA.BB.CC.128:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-11 -> ../../sdg
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 ip-AA.BB.CC.131:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-11 -> ../../sde
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 ip-AA.BB.CC.133:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-10 -> ../../sdf
lrwxrwxrwx. 1 root root 9 May 27 23:19 ip-AA.BB.CC.134:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-12 -> ../../sdc
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 ip-AA.BB.CC.135:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-10 -> ../../sdi
lrwxrwxrwx. 1 root root 9 May 27 23:19 ip-AA.BB.CC.137:3260-iscsi-iqn.1992-08.com.netapp:netapp0101-lun-12 -> ../../sdd
lrwxrwxrwx. 1 root root 9 Mar 25 15:16 pci-0000:03:00.0-scsi-0:2:0:0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Mar 25 15:16 pci-0000:03:00.0-scsi-0:2:0:0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 10 Mar 29 19:01 pci-0000:03:00.0-scsi-0:2:0:0-part2 -> ../../sda2

FC /dev/disk/by-id:
lrwxrwxrwx 1 root root 9 Dec 29 2018 ata-hp_DVD_A_DU8A5SH_427428900673 -> ../../sr0
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-mpathb -> ../../dm-1
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-mpathbp1 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-mpathd -> ../../dm-2
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg00-lv01 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg01-lv05 -> ../../dm-7
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg01-lv04 -> ../../dm-8
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg01-lv01 -> ../../dm-4
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg01-lv02 -> ../../dm-5
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg01-lv03 -> ../../dm-6
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-name-vg_sas_shared-lv_sas_shared -> ../../dm-9
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-c8QHFWgbGNeUzNiS8KG22pqD1OyS8cQxg2tvs030jp8Wc482d88d1dVgzhzVkLc6 -> ../../dm-9
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-TzJ1SylaoG4TlUpUG9UjXmArWDfTQsgVtF06kBDwsG9Ffm4Ef46D6tf3hvNK3IvB -> ../../dm-0
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-uw3IGDtJuta2Dul542oRA4TGibG4pvfm51i9MtY3JDAY9x8KhuCa1Sr3S9d0lgUy -> ../../dm-8
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-uw3IGDtJuta2Dul542oRA4TGibG4pvfmDfKDkGPUgblZwADju2VnxUvvRxW37xhK -> ../../dm-6
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-uw3IGDtJuta2Dul542oRA4TGibG4pvfmkHvWwhU8A7jR6nopT6gmo7py05CLoBcs -> ../../dm-7
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-uw3IGDtJuta2Dul542oRA4TGibG4pvfmQLpYnXJvV8sC2gMzsDrVoWVo1ujTE5Nb -> ../../dm-4
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-LVM-uw3IGDtJuta2Dul542oRA4TGibG4pvfmWDH4aOJZTgBpz8ZyNGhdlCrq2hZ9VhmJ -> ../../dm-5
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-mpath-360002ac0000000000000000500XXXXXX -> ../../dm-2
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-mpath-3600508b1001cd34c29cacc80cacb22a6 -> ../../dm-1
lrwxrwxrwx 1 root root 10 Dec 29 2018 dm-uuid-part1-mpath-3600508b1001cd34c29cacc80cacb22a6 -> ../../dm-3
lrwxrwxrwx 1 root root 10 Dec 29 2018 lvm-pv-uuid-1X6C8X-tGsQ-fvrs-2Vrl-zUWq-Xn0W-jRzUPv -> ../../dm-2
lrwxrwxrwx 1 root root 10 Dec 29 2018 lvm-pv-uuid-lNUbrK-pTTU-4dQ9-qTEl-oQep-mECz-aVeJlb -> ../../sda4
lrwxrwxrwx 1 root root 10 Dec 29 2018 lvm-pv-uuid-qzpjB5-G1UU-V3g3-WUjC-xzFc-KS7U-xF3a3R -> ../../dm-3
lrwxrwxrwx 1 root root 9 Dec 29 2018 scsi-360002ac0000000000000000500XXXXXX -> ../../sdj
lrwxrwxrwx 1 root root 9 Dec 29 2018 scsi-3600508b1001c04e67d207014e8c0e86a -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 29 2018 scsi-3600508b1001c04e67d207014e8c0e86a-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 29 2018 scsi-3600508b1001c04e67d207014e8c0e86a-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Dec 29 2018 scsi-3600508b1001c04e67d207014e8c0e86a-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Dec 29 2018 scsi-3600508b1001c04e67d207014e8c0e86a-part4 -> ../../sda4
lrwxrwxrwx 1 root root 9 Dec 29 2018 scsi-3600508b1001cd34c29cacc80cacb22a6 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 29 2018 scsi-3600508b1001cd34c29cacc80cacb22a6-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Dec 29 2018 wwn-0x60002ac0000000000000000500XXXXXX -> ../../sdj
lrwxrwxrwx 1 root root 9 Dec 29 2018 wwn-0x600508b1001c04e67d207014e8c0e86a -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 29 2018 wwn-0x600508b1001c04e67d207014e8c0e86a-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 29 2018 wwn-0x600508b1001c04e67d207014e8c0e86a-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Dec 29 2018 wwn-0x600508b1001c04e67d207014e8c0e86a-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Dec 29 2018 wwn-0x600508b1001c04e67d207014e8c0e86a-part4 -> ../../sda4
lrwxrwxrwx 1 root root 9 Dec 29 2018 wwn-0x600508b1001cd34c29cacc80cacb22a6 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 29 2018 wwn-0x600508b1001cd34c29cacc80cacb22a6-part1 -> ../../sdb1

FC /dev/disk/by-path:
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:00:1f.2-scsi-5:0:0:0 -> ../../sr0
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root root 10 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:0-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:0-part4 -> ../../sda4
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:1 -> ../../sdb
lrwxrwxrwx 1 root root 10 Dec 29 2018 pci-0000:03:00.0-scsi-0:0:0:1-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.0-fc-0x20120002acXXXXXX-lun-0 -> ../../sdc
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.0-fc-0x20120002acXXXXXX-lun-1 -> ../../sdd
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.0-fc-0x21120002acXXXXXX-lun-0 -> ../../sde
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.0-fc-0x21120002acXXXXXX-lun-1 -> ../../sdf
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.1-fc-0x20110002acXXXXXX-lun-0 -> ../../sdg
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.1-fc-0x20110002acXXXXXX-lun-1 -> ../../sdh
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.1-fc-0x21110002acXXXXXX-lun-0 -> ../../sdi
lrwxrwxrwx 1 root root 9 Dec 29 2018 pci-0000:05:00.1-fc-0x21110002acXXXXXX-lun-1 -> ../../sdj

@hunter86bg
Copy link

hunter86bg commented Aug 26, 2019

Mpath devices are a single LUN accessed over different paths, so in most cases should be treated as one.
The multipath daemon is using an easy to predict path selector.
Still, if statistics for a mpath is needed, we can use wwid-type-major:minor notation.
Something like:
3600a0980YYYYYYYYYYYYYYYYYYYYYYYY-mpath-253:8
Or
3600a0980YYYYYYYYYYYYYYYYYYYYYYYY-VGtest-LVtest

I know it's a complex problem, but we need a persistent way to recognize a block device before and after a reboot.

@sysstat
Copy link
Owner

sysstat commented Aug 27, 2019

FYI I have started to update sadc to provide stable identifiers across reboots. My idea is to save WWN ids (also with an additional partition number) in the binary datafile and to display them when sar -j ID is entered. For devices without WWIDs, the fallback will be to use the (major,minor) numbers. My understanding is that such a solution should bring some improvement compared to current situation even though it won't be enough to cover all possible cases.
I'm sorry but I don't know multipath devices, IIRC loop ones and all that stuff to propose a better solution as I only have my 8 years old personal machine at hand for tests and coding :-/
Yet give me some clear specifications and I'll be glad to add the feature to sysstat.

@lberra
Copy link

lberra commented Aug 29, 2019

I'm sorry but I don't know multipath devices, IIRC loop ones and all that stuff to propose a better solution as I only have my 8 years old personal machine at hand for tests and coding :-/

i am willing to test your code on a number of machines, and report results

Yet give me some clear specifications and I'll be glad to add the feature to sysstat.

leave the job to udev :-)

sysstat added a commit that referenced this issue Sep 2, 2019
This patch adds new fields to stats_disk structure to save a stable
identifier for each block device (see issue #195).
A stable identifier is a name that should not change across reboots for
the same physical device.
At the present time this stable identifier is the WWN (World Wide Name)
id that is read from /dev/disk/by-id if it exists for the device.
If it doesn't exist then we fall back on using the pretty name (sda,
sda1, etc.).
The stable identifier is always collected by sadc when disks statistics
are collected (sadc option "-S DISK | XDISK").
It can be printed by sar (or sadf) with the option "-j SID" (SID stands
for Stable IDentifier).

Signed-off-by: Sebastien GODARD <[email protected]>
@sysstat
Copy link
Owner

sysstat commented Sep 2, 2019

Commit 22e3fb2 adds support for stable identifiers to sar and sadf.
For now we use WWN identifiers, which are saved by sadc in the binary datafile. We fall back on using the pretty name (e.g. sda, sdb2...) if the WWN doesn't exist.
Datafile format has been modified but remains compatible with old sysstat versions.
Stable identifiers can be displayed by sar using option "-j SID" (Stable IDentifier).
Please test and tell me if anything goes wrong.
Excerpt from my own tests:

$ ./sar -dh -j SID -f tests/data.tmp
13:20:29          tps     rkB/s     wkB/s     dkB/s   areq-sz    aqu-sz     await     %util DEV
13:20:39      1604.70     40.5M     10.4M      0.0k     32.5k     18.56     12.00     85.4% 0x5000cca369f193ac
13:20:39         1.32     54.8k      0.0k      0.0k     41.3k      0.08     60.92      0.1% 0x5000cca369f193ac-1
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-2
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-3
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-4
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-5
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-6
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-7
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-8
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-9
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-10
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-11
13:20:39         0.00      0.0k      0.0k      0.0k      0.0k      0.00      0.00      0.0% 0x5000cca369f193ac-12
13:20:39         1.22     54.3k      0.0k      0.0k     44.5k      0.08     62.51      0.1% 0x600605b00a2bdf00242b28c10dcb1999
13:20:39         2.91    114.9k      0.3k      0.0k     39.6k      0.07     24.75      0.2% sdg
[...]

@hunter86bg
Copy link

hunter86bg commented Feb 13, 2020

Can you change the "-1" to the actual path of the device like "0x6000c296ab3905dc-[0:0:1:0]" which is visible in multipath , lsscsi & /proc/scsi/scsi:

 # lsscsi
[0:0:0:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sda
[0:0:1:0]    disk    VMware,  VMware Virtual S 1.0   /dev/sdb
[4:0:0:0]    cd/dvd  NECVMWar VMware SATA CD01 1.00  /dev/sr0

# /usr/lib/udev/scsi_id -g -u -x /dev/sdb | grep WWN
ID_WWN=0x6000c296ab3905dc
ID_WWN_VENDOR_EXTENSION=0x7a76a5c6bfab6b3a
ID_WWN_WITH_EXTENSION=0x6000c296ab3905dc7a76a5c6bfab6b3a

# multipath -ll
36000c296ab3905dc7a76a5c6bfab6b3a dm-3 VMware, ,VMware Virtual S
size=20G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  `- 0:0:1:0 sdb 8:16 active ready running

Recently , we got issues where only one of the paths was causing trouble and it was hard to find it based on historical data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants