Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new libmount-mountfd api is incompatible with python libmount #1992

Open
asavah opened this issue Jan 4, 2023 · 15 comments
Open

new libmount-mountfd api is incompatible with python libmount #1992

asavah opened this issue Jan 4, 2023 · 15 comments
Labels
NEEDINFO Need more information from reporter

Comments

@asavah
Copy link

asavah commented Jan 4, 2023

Python libmount fails if libmount was built with mountfd api.

    ctx = libmount.Context()
    ctx.fstype="overlay"
    ctx.source="overlay"
    lower = ":".join("{:x}".format(x) for x in range(int(yamldata["maxidx"])))
    ctx.options="lowerdir={},upperdir={},workdir={}".format(lower, yamldata["upper"], yamldata["work"])
    ctx.target=yamldata["target"]
    ctx.mount()
Traceback (most recent call last):
  File "/home/asavah/kross/kbs/overmount.py", line 61, in <module>
    main()
  File "/home/asavah/kross/kbs/overmount.py", line 51, in main
    __do_mount(options.mount)
  File "/home/asavah/kross/kbs/overmount.py", line 27, in __do_mount
    ctx.mount()
TypeError: Invalid argument

Building util-linux with --disable-libmount-mountfd-support works as intended.

@karelzak
Copy link
Collaborator

karelzak commented Jan 5, 2023

Can you provide complete example?

This works for me:

# python
Python 3.10.9 (main, Dec  7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylibmount as mnt
>>> cxt = mnt.Context()
>>> cxt.fstype = "overlay"
>>> cxt.source = "overlay"
>>> cxt.target = "/mnt/merged"
>>> cxt.options = "lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work"
>>> cxt.mount()
<libmount.Context object at 0x7fe0dfcb10f0, restricted=False>

You can also try it with enabled libmount debug:

LIBMOUNT_DEBUG=all python
Python 3.10.9 (main, Dec  7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pylibmount as mnt
1450583: libmount:     INIT: library debug mask: 0xffffff
1450583: libmount:     INIT: library version: 2.38.882
1450583: libmount:     INIT:     feature: selinux
1450583: libmount:     INIT:     feature: smack
1450583: libmount:     INIT:     feature: btrfs
1450583: libmount:     INIT:     feature: namespaces
1450583: libmount:     INIT:     feature: idmapping
1450583: libmount:     INIT:     feature: fd-based-mount
1450583: libmount:     INIT:     feature: assert
1450583: libmount:     INIT:     feature: debug
Available "LIBMOUNT_DEBUG=<name>[,...]|<mask>" debug masks:
   all      [0xffffff] : info about all subsystems
   cache    [0x000004] : paths and tags cache
   cxt      [0x000200] : library context (handler)
   diff     [0x000400] : mountinfo changes tracking
   fs       [0x000040] : FS abstraction
   help     [0x000001] : this help
   hook     [0x008000] : hooks functionality
   locks    [0x000010] : mtab and utab locking
   loop     [0x002000] : loop devices routines
   options  [0x000008] : mount options parsing
   optlist  [0x010000] : mount options container
   tab      [0x000020] : fstab, mtab, mountinfo routines
   update   [0x000080] : mtab, utab updates
   utils    [0x000100] : misc library utils
   monitor  [0x000800] : mount tables monitor
   btrfs    [0x001000] : btrfs specific routines
   verity   [0x004000] : verity specific routines
>>> cxt = mnt.Context()
1450583: libmount:      CXT: [0x556d693f4dc0]: ----> allocate 
1450583: libmount:  OPTLIST: [0x556d693f6220]: alloc
1450583: libmount:  OPTLIST: [0x556d693f6220]: registr map 0x7f5fa1ffd780
1450583: libmount:  OPTLIST: [0x556d693f6220]: registr map 0x7f5fa1ffd580
1450583: libmount:  OPTLIST: [0x556d693f6220]: set 0x00000000
1450583: libmount:  OPTLIST: [0x556d693f6220]: append 0x00000000
>>> cxt.fstype = "overlay"
1450583: libmount:       FS: [0x556d693f4f80]: alloc
>>> cxt.source = "overlay"
>>> cxt.target = "/mnt/merged"
>>> cxt.options = "lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work"
1450583: libmount:  OPTLIST: [0x556d693f6220]: set lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work
1450583: libmount:  OPTLIST: [0x556d693f6220]:  added lowerdir
1450583: libmount:  OPTLIST: [0x556d693f6220]:  added upperdir
1450583: libmount:  OPTLIST: [0x556d693f6220]:  added workdir
>>> cxt.mount()
1450583: libmount:      CXT: [0x556d693f4dc0]: mount: preparing
1450583: libmount:      CXT: [0x556d693f4dc0]: use default optsmode
1450583: libmount:  OPTLIST: [0x556d693f6220]: return flags 0x00000000 [map=0x7f5fa1ffd780]
1450583: libmount:      CXT: [0x556d693f4dc0]: OPTSMODE (file-part): force=0, fstab=1, mtab=1
1450583: libmount:      CXT: [0x556d693f4dc0]: fstab not required -- skip
1450583: libmount:  OPTLIST: [0x556d693f6220]: merging
1450583: libmount:      CXT: [0x556d693f4dc0]: mount: evaluating permissions
1450583: libmount:  OPTLIST: [0x556d693f6220]: return flags 0x00000000 [map=0x7f5fa1ffd580]
1450583: libmount:      CXT: [0x556d693f4dc0]: perms: superuser
1450583: libmount:      CXT: [0x556d693f4dc0]: --> preparing options
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:prep-options
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __idmap hook
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __owner hook
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:prep-options [rc=0 status=1]
1450583: libmount:      CXT: [0x556d693f4dc0]: <-- preparing options done [rc=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: --> preparing source path
1450583: libmount:      CXT: [0x556d693f4dc0]: srcpath 'overlay'
1450583: libmount:    CACHE: [0x556d693f6c40]: alloc
1450583: libmount:      CXT: [0x556d693f4dc0]: REMOUNT/BIND/MOVE/pseudo FS source: overlay
1450583: libmount:      CXT: [0x556d693f4dc0]: --> preparing fstype
1450583: libmount:      CXT: [0x556d693f4dc0]: FS type: overlay [rc=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: --> preparing target path
1450583: libmount:    CACHE: [0x556d693f6c40]: canonicalize path /mnt/merged
1450583: libmount:    CACHE: [0x556d693f6c40]: add entry [ 1] (path): /mnt/merged: /mnt/merged
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:prep-target
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __mkdir hook
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __subdir hook
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:prep-target [rc=0 status=1]
1450583: libmount:      CXT: [0x556d693f4dc0]: final target '/mnt/merged' [rc=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: checking for helper
1450583: libmount:      CXT: [0x556d693f4dc0]: /sbin/mount.overlay       ... not found
1450583: libmount:      CXT: [0x556d693f4dc0]: /sbin/fs.d/mount.overlay  ... not found
1450583: libmount:      CXT: [0x556d693f4dc0]: /sbin/fs/mount.overlay    ... not found
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:prep
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __mount hook
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: prepare mount
1450583: libmount:  OPTLIST: [0x556d693f6220]: return flags 0x00000000 [map=0x7f5fa1ffd780]
1450583: libmount:  OPTLIST: [0x556d693f6220]: return attrs set=0x00000000, clr=0x00000000
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: initialize API fds
1450583: libmount:      CXT: [0x556d693f4dc0]:  alloc '__mount' data
1450583: libmount:     HOOK:  new FS 'overlay'
1450583: libmount:     HOOK:  fsopen(overlay)
1450583: libmount:      CXT: syscall 'fsopen' [succes]
1450583: libmount:      CXT: [0x556d693f4dc0]:  appending mount hook from __mount
1450583: libmount:      CXT: [0x556d693f4dc0]:  appending post-mount hook from __mount
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: prepare mount done [rc=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __legacy-mount hook
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:prep [rc=0 status=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: --> prepare update
1450583: libmount:      CXT: [0x556d693f4dc0]: utab path initialized to: /run/mount/utab
1450583: libmount:      CXT: [0x556d693f4dc0]: checking for writable tab files
1450583: libmount:    UTILS: utab: /run/mount/utab
1450583: libmount:    UTILS: try write /run/mount/utab dir: (null)
1450583: libmount:    UTILS:  access OK
1450583: libmount:   UPDATE: [0x556d693f6d50]: allocate
1450583: libmount:  OPTLIST: [0x556d693f6220]: return flags 0x00000000 [map=0x7f5fa1ffd780]
1450583: libmount:   UPDATE: [0x556d693f6d50]: resetting FS [target=(null), flags=0x00000000]
1450583: libmount:   UPDATE: [0x556d693f6d50]: FS template:
1450583: libmount:   UPDATE: 1450583: libmount:       FS: [0x556d693f4f80]: synced: vfs: 'rw' fs: 'lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work' user: '(null)', optstr: 'rw,lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work'
------ fs:
source: overlay
target: /mnt/merged
fstype: overlay
optstr: rw,lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work
VFS-optstr: rw
FS-opstr: lowerdir=/mnt/low,upperdir=/mnt/high,workdir=/mnt/work
1450583: libmount:   UPDATE: prepare utab entry
1450583: libmount:   UPDATE: utab entry unnecessary (no options)
1450583: libmount:      CXT: [0x556d693f4dc0]: mount: do mount
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:pre-mount
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:pre-mount [rc=0 status=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:mount
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __mount hook
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: create FS instance
1450583: libmount:      CXT: syscall 'fsconfig' [succes]
1450583: libmount:     HOOK: [0x7f5fa1ffd500]:  configure FS
1450583: libmount:     HOOK: [0x7f5fa1ffd500]:   fsconfig(name=lowerdir,value=/mnt/low)
1450583: libmount:      CXT: syscall 'fsconfig' [succes]
1450583: libmount:     HOOK: [0x7f5fa1ffd500]:   fsconfig(name=upperdir,value=/mnt/high)
1450583: libmount:      CXT: syscall 'fsconfig' [succes]
1450583: libmount:     HOOK: [0x7f5fa1ffd500]:   fsconfig(name=workdir,value=/mnt/work)
1450583: libmount:      CXT: syscall 'fsconfig' [succes]
1450583: libmount:      CXT: syscall 'fsconfig' [succes]
1450583: libmount:      CXT: syscall 'fsmount' [succes]
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: create FS done [rc=0, id=1690]
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:mount [rc=0 status=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:post-mount
1450583: libmount:      CXT: [0x556d693f4dc0]: calling __mount hook
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: move_mount(to=/mnt/merged)
1450583: libmount:      CXT: syscall 'move_mount' [succes]
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:post-mount [rc=0 status=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: mnt_context_do_mount() done [rc=0]
1450583: libmount:      CXT: [0x556d693f4dc0]: don't update: no update prepared
1450583: libmount:      CXT: [0x556d693f4dc0]: ---> stage:post
1450583: libmount:      CXT: [0x556d693f4dc0]: <--- stage:post [rc=0 status=0]
1450583: libmount:     HOOK: [0x7f5fa1ffd560]: deinit '__loopdev'
1450583: libmount:     HOOK: [0x7f5fa1ffd540]: deinit '__mkdir'
1450583: libmount:     HOOK: [0x7f5fa1ffd520]: deinit '__subdir'
1450583: libmount:     HOOK: [0x7f5fa1ffd500]: deinit '__mount'
1450583: libmount:      CXT: [0x556d693f4dc0]:  removing mount hook from __mount
1450583: libmount:      CXT: [0x556d693f4dc0]:  removing post-mount hook from __mount
1450583: libmount:      CXT: [0x556d693f4dc0]:  free '__mount' data
1450583: libmount:     HOOK: [0x7f5fa1ffd4e0]: deinit '__legacy-mount'
1450583: libmount:     HOOK: [0x7f5fa1ffd4c0]: deinit '__idmap'
1450583: libmount:     HOOK: [0x7f5fa1ffd4a0]: deinit '__owner'
1450583: libmount:      CXT: [0x556d693f4dc0]: mnt_context_mount() done [rc=0]
<libmount.Context object at 0x7f5fa20b10f0, restricted=False>
>>> 

@karelzak karelzak added the NEEDINFO Need more information from reporter label Jan 5, 2023
@asavah
Copy link
Author

asavah commented Mar 20, 2023

Sorry I haven't had time to investigate further and then I forgot about this ...
Currently as far as I can tell everything is working fine without using --disable-libmount-mountfd-support.

Sorry for the noise.

@asavah asavah closed this as completed Mar 20, 2023
@asavah
Copy link
Author

asavah commented Mar 20, 2023

Sorry I was too quick to judge again.
The bug occurs when the options string is pretty large, same huge options string works fine when util-linux is built with --disable-libmount-mountfd-support

asavah@kross-1910:~/kross/tmp/asusb450eg/src$ sudo python3
Python 3.11.2 (main, Mar 19 2023, 17:06:46) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libmount
>>> ctx = libmount.Context()
>>> ctx.fstype="overlay"
>>> ctx.source="overlay"
>>> ctx.target="/home/asavah/kross/build/asusb450eg/rootfs"
>>> ctx.options="rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw"
>>> ctx.mount()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Invalid argument

Do note that slightly shorter (goes up to :60) version does work:

ctx.options="rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw"

I don't know what kind of limit it is hitting here, maybe it's a kernel problem ...

@asavah asavah reopened this Mar 20, 2023
@asavah
Copy link
Author

asavah commented Mar 20, 2023

Same with debug enabled:

asavah@kross-1910:~/kross/tmp/asusb450eg/src$ sudo LIBMOUNT_DEBUG=all python
Python 3.11.2 (main, Mar 19 2023, 17:06:46) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libmount
2470478: libmount:     INIT: library debug mask: 0xffffff
2470478: libmount:     INIT: library version: 2.39.0
2470478: libmount:     INIT:     feature: btrfs
2470478: libmount:     INIT:     feature: namespaces
2470478: libmount:     INIT:     feature: idmapping
2470478: libmount:     INIT:     feature: fd-based-mount
2470478: libmount:     INIT:     feature: assert
2470478: libmount:     INIT:     feature: debug
Available "LIBMOUNT_DEBUG=<name>[,...]|<mask>" debug masks:
   all      [0xffffff] : info about all subsystems
   cache    [0x000004] : paths and tags cache
   cxt      [0x000200] : library context (handler)
   diff     [0x000400] : mountinfo changes tracking
   fs       [0x000040] : FS abstraction
   help     [0x000001] : this help
   hook     [0x008000] : hooks functionality
   locks    [0x000010] : mtab and utab locking
   loop     [0x002000] : loop devices routines
   options  [0x000008] : mount options parsing
   optlist  [0x010000] : mount options container
   tab      [0x000020] : fstab, mtab, mountinfo routines
   update   [0x000080] : mtab, utab updates
   utils    [0x000100] : misc library utils
   monitor  [0x000800] : mount tables monitor
   btrfs    [0x001000] : btrfs specific routines
   verity   [0x004000] : verity specific routines
>>> ctx = libmount.Context()
2470478: libmount:      CXT: [0x5602e8294fb0]: ----> allocate
2470478: libmount:  OPTLIST: [0x5602e8327c40]: alloc
2470478: libmount:  OPTLIST: [0x5602e8327c40]: registr map 0x7fbeb06bb2a0
2470478: libmount:  OPTLIST: [0x5602e8327c40]: registr map 0x7fbeb06bb0a0
2470478: libmount:  OPTLIST: [0x5602e8327c40]: set 0x00000000
2470478: libmount:  OPTLIST: [0x5602e8327c40]: append 0x00000000
>>> ctx.fstype="overlay"
2470478: libmount:       FS: [0x5602e82db360]: alloc
>>> ctx.source="overlay"
>>> ctx.target="/home/asavah/kross/build/asusb450eg/rootfs"
>>> ctx.options="rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw"
2470478: libmount:  OPTLIST: [0x5602e8327c40]: set rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw
2470478: libmount:  OPTLIST: [0x5602e8327c40]:  added rw [id=0x00000001 map=0x7fbeb06bb2a0]
2470478: libmount:  OPTLIST: [0x5602e8327c40]:  added lowerdir
2470478: libmount:  OPTLIST: [0x5602e8327c40]:  added upperdir
2470478: libmount:  OPTLIST: [0x5602e8327c40]:  added workdir
>>> ctx.mount()
2470478: libmount:      CXT: [0x5602e8294fb0]: mount: preparing
2470478: libmount:      CXT: [0x5602e8294fb0]: use default optsmode
2470478: libmount:  OPTLIST: [0x5602e8327c40]: return flags 0x00000000 [map=0x7fbeb06bb2a0]
2470478: libmount:      CXT: [0x5602e8294fb0]: OPTSMODE (file-part): force=0, fstab=1, mtab=1
2470478: libmount:      CXT: [0x5602e8294fb0]: fstab not required -- skip
2470478: libmount:  OPTLIST: [0x5602e8327c40]: merging
2470478: libmount:      CXT: [0x5602e8294fb0]: mount: evaluating permissions
2470478: libmount:  OPTLIST: [0x5602e8327c40]: return flags 0x00000000 [map=0x7fbeb06bb0a0]
2470478: libmount:      CXT: [0x5602e8294fb0]: perms: superuser
2470478: libmount:      CXT: [0x5602e8294fb0]: --> preparing options
2470478: libmount:      CXT: [0x5602e8294fb0]: ---> stage:prep-options
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __idmap [first]
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __owner [first]
2470478: libmount:      CXT: [0x5602e8294fb0]: <--- stage:prep-options [rc=0 status=1]
2470478: libmount:      CXT: [0x5602e8294fb0]: <-- preparing options done [rc=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: --> preparing source path
2470478: libmount:      CXT: [0x5602e8294fb0]: srcpath 'overlay'
2470478: libmount:    CACHE: [0x5602e82f8730]: alloc
2470478: libmount:      CXT: [0x5602e8294fb0]: REMOUNT/BIND/MOVE/pseudo FS source: overlay
2470478: libmount:      CXT: [0x5602e8294fb0]: --> preparing fstype
2470478: libmount:      CXT: [0x5602e8294fb0]: FS type: overlay [rc=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: --> preparing target path
2470478: libmount:    CACHE: [0x5602e82f8730]: canonicalize path /home/asavah/kross/build/asusb450eg/rootfs
2470478: libmount:    CACHE: [0x5602e82f8730]: add entry [ 1] (path): /home/asavah/kross/build/asusb450eg/rootfs: /home/asavah/kross/build/asusb450eg/rootfs
2470478: libmount:      CXT: [0x5602e8294fb0]: ---> stage:prep-target
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __mkdir [first]
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __subdir [first]
2470478: libmount:      CXT: [0x5602e8294fb0]: <--- stage:prep-target [rc=0 status=1]
2470478: libmount:      CXT: [0x5602e8294fb0]: final target '/home/asavah/kross/build/asusb450eg/rootfs' [rc=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: checking for helper
2470478: libmount:      CXT: [0x5602e8294fb0]: /home/asavah/kross/host/sbin/mount.overlay ... not found
2470478: libmount:      CXT: [0x5602e8294fb0]: /home/asavah/kross/host/bin/mount.overlay ... not found
2470478: libmount:      CXT: [0x5602e8294fb0]: /sbin/mount.overlay       ... not found
2470478: libmount:      CXT: [0x5602e8294fb0]: /usr/sbin/mount.overlay   ... not found
2470478: libmount:      CXT: [0x5602e8294fb0]: /bin/usr/bin/mount.overlay ... not found
2470478: libmount:      CXT: [0x5602e8294fb0]: ---> stage:prep
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __mount [first]
2470478: libmount:     HOOK: [0x7fbeb06bb540]: prepare mount
2470478: libmount:  OPTLIST: [0x5602e8327c40]: return flags 0x00000000 [map=0x7fbeb06bb2a0]
2470478: libmount:  OPTLIST: [0x5602e8327c40]:  clr: rw
2470478: libmount:  OPTLIST: [0x5602e8327c40]: return attrs set=0x00000000, clr=0x00000001
2470478: libmount:     HOOK: [0x7fbeb06bb540]: initialize API fds
2470478: libmount:      CXT: [0x5602e8294fb0]:  alloc '__mount' data
2470478: libmount:     HOOK:  new FS 'overlay'
2470478: libmount:     HOOK:  fsopen(overlay)
2470478: libmount:      CXT: syscall 'fsopen' [succes]
2470478: libmount:      CXT: [0x5602e8294fb0]:  appending mount hook from __mount
2470478: libmount:      CXT: [0x5602e8294fb0]:  appending mount hook from __mount
2470478: libmount:      CXT: [0x5602e8294fb0]:  appending post-mount hook from __mount
2470478: libmount:     HOOK: [0x7fbeb06bb540]: prepare mount done [rc=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __legacy-mount [first]
2470478: libmount:      CXT: [0x5602e8294fb0]: <--- stage:prep [rc=0 status=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: --> prepare update
2470478: libmount:      CXT: [0x5602e8294fb0]: utab path initialized to: /run/mount/utab
2470478: libmount:      CXT: [0x5602e8294fb0]: checking for writable tab files
2470478: libmount:    UTILS: utab: /run/mount/utab
2470478: libmount:    UTILS: try write /run/mount/utab dir: (null)
2470478: libmount:    UTILS:  access OK
2470478: libmount:   UPDATE: [0x5602e828d190]: allocate
2470478: libmount:  OPTLIST: [0x5602e8327c40]: return flags 0x00000000 [map=0x7fbeb06bb2a0]
2470478: libmount:   UPDATE: [0x5602e828d190]: resetting FS [target=(null), flags=0x00000000]
2470478: libmount:   UPDATE: [0x5602e828d190]: FS template:
2470478: libmount:   UPDATE: 2470478: libmount:       FS: [0x5602e82db360]: synced: vfs: 'rw' fs: 'lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw' user: '(null)', optstr: 'rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw'
------ fs:
source: overlay
target: /home/asavah/kross/build/asusb450eg/rootfs
fstype: overlay
optstr: rw,lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw
VFS-optstr: rw
FS-opstr: lowerdir=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61,upperdir=/home/asavah/kross/tmp/asusb450eg/upper,workdir=/tmp/kw
2470478: libmount:   UPDATE: prepare utab entry
2470478: libmount:   UPDATE: utab entry unnecessary (no options)
2470478: libmount:      CXT: [0x5602e8294fb0]: mount: do mount
2470478: libmount:      CXT: [0x5602e8294fb0]: ---> stage:pre-mount
2470478: libmount:      CXT: [0x5602e8294fb0]: <--- stage:pre-mount [rc=0 status=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: ---> stage:mount
2470478: libmount:      CXT: [0x5602e8294fb0]: calling __mount [active]
2470478: libmount:     HOOK: [0x7fbeb06bb540]: init FS
2470478: libmount:      CXT: syscall 'fsconfig' [succes]
2470478: libmount:     HOOK: [0x7fbeb06bb540]:  config FS
2470478: libmount:     HOOK: [0x7fbeb06bb540]:   fsconfig(name=rw,value=(null))
2470478: libmount:      CXT: syscall 'fsconfig' [succes]
2470478: libmount:     HOOK: [0x7fbeb06bb540]:   fsconfig(name=lowerdir,value=0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:30:31:32:33:34:35:36:37:38:39:3a:3b:3c:3d:3e:3f:40:41:42:43:44:45:46:47:48:49:4a:4b:4c:4d:4e:4f:50:51:52:53:54:55:56:57:58:59:5a:5b:5c:5d:5e:5f:60:61)
2470478: libmount:      CXT: syscall 'fsconfig' [Invalid argument]
2470478: libmount:     HOOK: [0x7fbeb06bb540]: create FS done [rc=-22, id=0]
2470478: libmount:      CXT: [0x5602e8294fb0]: <--- stage:mount [rc=-22 status=-22]
2470478: libmount:      CXT: [0x5602e8294fb0]: mnt_context_do_mount() done [rc=-22]
2470478: libmount:     HOOK: [0x7fbeb06bb600]: deinit '__loopdev'
2470478: libmount:     HOOK: [0x7fbeb06bb580]: deinit '__mkdir'
2470478: libmount:     HOOK: [0x7fbeb06bb5a0]: deinit '__subdir'
2470478: libmount:     HOOK: [0x7fbeb06bb540]: deinit '__mount'
2470478: libmount:      CXT: [0x5602e8294fb0]:  removing mount hook from __mount
2470478: libmount:      CXT: [0x5602e8294fb0]:  removing mount hook from __mount
2470478: libmount:      CXT: [0x5602e8294fb0]:  removing post-mount hook from __mount
2470478: libmount:      CXT: [0x5602e8294fb0]:  free '__mount' data
2470478: libmount:     HOOK: [0x7fbeb06bb560]: deinit '__legacy-mount'
2470478: libmount:     HOOK: [0x7fbeb06bb5e0]: deinit '__idmap'
2470478: libmount:     HOOK: [0x7fbeb06bb5c0]: deinit '__owner'
2470478: libmount:      CXT: [0x5602e8294fb0]: mnt_context_mount() done [rc=-22]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Invalid argument

@asavah
Copy link
Author

asavah commented Mar 21, 2023

Probably the limit is set here https://github.com/torvalds/linux/blob/master/fs/fsopen.c#L397
because "256 ought to be enough for anybody" (c)

@karelzak
Copy link
Collaborator

The limit for mount options in fsconfig() should be large enough to accept paths (overlay FS is a nice example), _POSIX_PATH_MAX is 256, but the usually used PATH_MAX is 4096.

This is something you should report to kernel developers. Maybe @brauner will know more.

@asavah
Copy link
Author

asavah commented Mar 21, 2023

Thx, what I posted is a simplified real use case, I know that there are limtations like PATH_MAX and ARG_MAX.
That's why I made the script in python which first creates a bunch of short symlinks (0:1:2:...) to the real much longer paths and the feeds the resulting bunch of symlinks as lower= to overlayfs. If I used real absolute paths that lower= would be insanely long.

This actually limits a lot how overlayfs can be used, for example if someone used a bunch of absolute paths eg lower=/home/username/project/data1:/home/username/project/data2:/home/username/project/dataN they would hit the limit real quick which makes usage of fsconfig api in mount undesirable at least for overlayfs.

I would rather not try to deal with kernel folks, the entry level to that club to even report a bug is too high for me.
I would appreciate if anyone familiar with kernel mailing lists reported this.

@brauner
Copy link
Contributor

brauner commented Mar 21, 2023

Thx, what I posted is a simplified real use case, I know that there are limtations like PATH_MAX and ARG_MAX. That's why I made the script in python which first creates a bunch of short symlinks (0:1:2:...) to the real much longer paths and the feeds the resulting bunch of symlinks as lower= to overlayfs. If I used real absolute paths that lower= would be insanely long.

This actually limits a lot how overlayfs can be used, for example if someone used a bunch of absolute paths eg lower=/home/username/project/data1:/home/username/project/data2:/home/username/project/dataN they would hit the limit real quick which makes usage of fsconfig api in mount undesirable at least for overlayfs.

I would rather not try to deal with kernel folks, the entry level to that club to even report a bug is too high for me. I would appreciate if anyone familiar with kernel mailing lists reported this.

I'll try to get that fixed. This seems like a legitimate problem. Might take until next week though.

@asavah
Copy link
Author

asavah commented Mar 21, 2023

Thx, what I posted is a simplified real use case, I know that there are limtations like PATH_MAX and ARG_MAX. That's why I made the script in python which first creates a bunch of short symlinks (0:1:2:...) to the real much longer paths and the feeds the resulting bunch of symlinks as lower= to overlayfs. If I used real absolute paths that lower= would be insanely long.
This actually limits a lot how overlayfs can be used, for example if someone used a bunch of absolute paths eg lower=/home/username/project/data1:/home/username/project/data2:/home/username/project/dataN they would hit the limit real quick which makes usage of fsconfig api in mount undesirable at least for overlayfs.
I would rather not try to deal with kernel folks, the entry level to that club to even report a bug is too high for me. I would appreciate if anyone familiar with kernel mailing lists reported this.

I'll try to get that fixed. This seems like a legitimate problem. Might take until next week though.

Thank you for taking a look into this, appreciate it.

@brauner
Copy link
Contributor

brauner commented Mar 28, 2023

So I looked into this today. The reason why we consider 256 bytes enough is that mount options can be specified differently than in the old mount api. So in the old mount api you have needed to specify all mount options as a single string and pass that to mount(2). In the new mount api you can split setting mount options. So you would do:

int main(int argc, char *argv[])
{
        int mnt_fd, fs_fd, ret, tree_fd;

        fs_fd = fsopen("overlay", FSOPEN_CLOEXEC);
        if (fs_fd < 0)
                die_errno("Failed to create overlay context");

        ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/home/asavah/kross/tmp/asusb450eg/upper", 0);
        if (ret < 0)
                die_errno("fsconfig: set upperdir");

        ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "workdir", "/tmp/kw", 0);
        if (ret < 0)
                die_errno("fsconfig: set workdir");

        ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "0:1:2:3:4:5:6:7:8:9:a:b:c:d:e:f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f:20:21:22:23:24:25:26:27:28:29:2a:2b:2c:2d:2e:2f:
        if (ret < 0)
                die_errno("fsconfig: set workdir");

        ret = fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
        if (ret < 0)
                die_errno("fsconfig: create");

        mnt_fd = fsmount(fs_fd, 0, 0);
        if (mnt_fd < 0)
                die_errno("fsmount");

        ret = move_mount(mnt_fd, "", -EBADF, "/home/asavah/kross/build/asusb450eg/rootfs",
                         MOVE_MOUNT_F_EMPTY_PATH);
        if (ret < 0)
                die_errno("move_mount");

        exit(EXIT_SUCCESS);
}

However, as you correctly point out, there's an additional problem if e.g., the lowerdir mount option takes multiple parameters. Currently you need to specify lowerdir= exactly one time. But that's only because overlayfs hasn't been ported to the new mount api yet. Once that's done you can just specify:

ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/home/username/project/data1", 0);
if (ret < 0)
        die_errno("fsconfig: set lowerdir");

ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/home/username/project/data2", 0);
if (ret < 0)
        die_errno("fsconfig: set lowerdir");

ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/home/username/project/data3", 0);
if (ret < 0)
        die_errno("fsconfig: set lowerdir");

/*
 * .
 * .
 * .
 *
 * The total limit is 2 * PAGE_SIZE btw.
 */

ret = fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/home/username/project/dataN", 0);
if (ret < 0)
        die_errno("fsconfig: set lowerdir");

So I'd rather fix overlayfs by porting it to the new mount api rather than encouraging hacking this all into one string by expanding the buffer.

@karelzak
Copy link
Collaborator

karelzak commented Jun 2, 2023

This is probably another example of when some env. variable to use the old API in libmount (something like LIBMOUNT_FORCE_MOUNT2=yes) would be useful as a temporary workaround.

@Andrei-Pozolotin
Copy link

this now trickles down into archlinux:
https://bugs.archlinux.org/task/78702

@Andrei-Pozolotin
Copy link

Andrei-Pozolotin commented Jun 4, 2023

@brauner : RE:

So I'd rather fix overlayfs by porting it to the new mount api rather than encouraging hacking this all into one string by expanding the buffer.

  1. there is NO mention of this issue in kernel bugs / overlayfs:

https://bugzilla.kernel.org/buglist.cgi?bug_status=__open__&content=overlayfs&no_redirect=1&order=changeddate%20DESC%2Cpriority%2Cbug_severity&query_format=specific

  1. can you please file a proper / meaningful bug report there?

  2. who / where can we find / ping people who might help resolve this issue? thank you.

@thesamesam
Copy link
Contributor

thesamesam commented Jun 4, 2023

He is the relevant person and I think him writing up the problem in this bug is enough given it's likely for his own sake, given how kernel development is (people aren't required to use bugzilla).

karelzak added a commit to karelzak/util-linux-work that referenced this issue Jun 5, 2023
Let's introduce a stable workaround for use cases where new kernel API
is not ready to use.

The patch does not use "goto enosys" to exit as nothing in the hookset
is initialized yet.

Addresses: util-linux#1992
Addresses: util-linux#2283
Signed-off-by: Karel Zak <[email protected]>
karelzak added a commit to karelzak/util-linux-work that referenced this issue Jun 7, 2023
Let's introduce a stable workaround for use cases where new kernel API
is not ready to use.

The patch does not use "goto enosys" to exit as nothing in the hookset
is initialized yet.

Addresses: util-linux#1992
Addresses: util-linux#2283
Signed-off-by: Karel Zak <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle gracefull.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather enable users to be able to append layers as the
interface allows nicely for this. This has also been requested as a
feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

Overlayfs will now parse and resolve layers right when a layer is
specified in fsconfig(). This allows users to receive early errors
about unusable paths.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all specified layers */
	fsconfig(FSCONFIG_SET_PATH_EMPTY, "lowerdir", "", -1);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle gracefull.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather enable users to be able to append layers as the
interface allows nicely for this. This has also been requested as a
feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

Overlayfs will now parse and resolve layers right when a layer is
specified in fsconfig(). This allows users to receive early errors
about unusable paths.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all specified layers */
	fsconfig(FSCONFIG_SET_PATH_EMPTY, "lowerdir", "", -1);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle gracefull.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather enable users to be able to append layers as the
interface allows nicely for this. This has also been requested as a
feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

Overlayfs will now parse and resolve layers right when a layer is
specified in fsconfig(). This allows users to receive early errors
about unusable paths.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all specified layers */
	fsconfig(FSCONFIG_SET_PATH_EMPTY, "lowerdir", "", -1);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle nicely.

Now with mount(8) and libmount supporting the new mount api I expect an
increase in the number of bug reports and issues we're going to see with
filesystems that don't yet support the new mount api. So it's time we
rectify this.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather encourage users to append layers via multiple
fsconfig() calls as the interface allows nicely for this. This has also
been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all layers specified until now */
	fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle nicely.

Now with mount(8) and libmount supporting the new mount api I expect an
increase in the number of bug reports and issues we're going to see with
filesystems that don't yet support the new mount api. So it's time we
rectify this.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather encourage users to append layers via multiple
fsconfig() calls as the interface allows nicely for this. This has also
been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all layers specified until now */
	fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
---

I'm starting to get the feeling that I stared enough at this and I would
need a fresh set of eyes to review it for any bugs. Plus, Amir seems to
have conflicting series and I would have to rebase anyway so no point in
delaying this any further.
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jun 8, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle nicely.

Now with mount(8) and libmount supporting the new mount api I expect an
increase in the number of bug reports and issues we're going to see with
filesystems that don't yet support the new mount api. So it's time we
rectify this.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather encourage users to append layers via multiple
fsconfig() calls as the interface allows nicely for this. This has also
been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all layers specified until now */
	fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 9, 2023
We recently ported util-linux to the new mount api. Now the mount(8)
tool will by default use the new mount api. While trying hard to fall
back to the old mount api gracefully there are still cases where we run
into issues that are difficult to handle nicely.

Now with mount(8) and libmount supporting the new mount api I expect an
increase in the number of bug reports and issues we're going to see with
filesystems that don't yet support the new mount api. So it's time we
rectify this.

For overlayfs specifically we ran into issues where mount(8) passed
multiple lower layers as one big string through fsconfig(). But the
fsconfig() FSCONFIG_SET_STRING option is limited to 256 bytes in
strndup_user(). While this would be fixable by extending the fsconfig()
buffer I'd rather encourage users to append layers via multiple
fsconfig() calls as the interface allows nicely for this. This has also
been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

	/* set upper layer */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

	/* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

	/* turn index feature on */
	fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

	/* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

	/* append */
	fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

	/* clear all layers specified until now */
	fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287 [1]
Link: util-linux/util-linux#1992 [2]
Link: https://bugs.archlinux.org/task/78702 [3]
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner [4]
Signed-off-by: Christian Brauner <[email protected]>
---

I'm starting to get the feeling that I stared enough at this and I would
need a fresh set of eyes to review it for any bugs. Plus, Amir seems to
have conflicting series and I would have to rebase anyway so no point in
delaying this any further.
brauner added a commit to brauner/linux that referenced this issue Jun 9, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jun 9, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 12, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
karelzak added a commit that referenced this issue Jun 12, 2023
Let's introduce a stable workaround for use cases where new kernel API
is not ready to use.

The patch does not use "goto enosys" to exit as nothing in the hookset
is initialized yet.

Addresses: #1992
Addresses: #2283
Signed-off-by: Karel Zak <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 13, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
brauner added a commit to brauner/linux that referenced this issue Jun 13, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this issue Jun 13, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 15, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il added a commit to amir73il/linux that referenced this issue Jun 16, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il added a commit to amir73il/linux that referenced this issue Jun 16, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il added a commit to amir73il/linux that referenced this issue Jun 16, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 16, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 17, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 17, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 17, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 17, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 19, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
amir73il pushed a commit to amir73il/linux that referenced this issue Jun 20, 2023
We ran into issues where mount(8) passed multiple lower layers as one
big string through fsconfig(). But the fsconfig() FSCONFIG_SET_STRING
option is limited to 256 bytes in strndup_user(). While this would be
fixable by extending the fsconfig() buffer I'd rather encourage users to
append layers via multiple fsconfig() calls as the interface allows
nicely for this. This has also been requested as a feature before.

With this port to the new mount api the following will be possible:

        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", "/lower1", 0);

        /* set upper layer */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "upperdir", "/upper", 0);

        /* append "/lower2", "/lower3", and "/lower4" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower2:/lower3:/lower4", 0);

        /* turn index feature on */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "index", "on", 0);

        /* append "/lower5" */
        fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir", ":/lower5", 0);

Specifying ':' would have been rejected so this isn't a regression. And
we can't simply use "lowerdir=/lower" to append on top of existing
layers as "lowerdir=/lower,lowerdir=/other-lower" would make
"/other-lower" the only lower layer so we'd break uapi if we changed
this. So the ':' prefix seems a good compromise.

Users can choose to specify multiple layers at once or individual
layers. A layer is appended if it starts with ":". This requires that
the user has already added at least one layer before. If lowerdir is
specified again without a leading ":" then all previous layers are
dropped and replaced with the new layers. If lowerdir is specified and
empty than all layers are simply dropped.

An additional change is that overlayfs will now parse and resolve layers
right when they are specified in fsconfig() instead of deferring until
super block creation. This allows users to receive early errors.

It also allows users to actually use up to 500 layers something which
was theoretically possible but ended up not working due to the mount
option string passed via mount(2) being too large.

This also allows a more privileged process to set config options for a
lesser privileged process as the creds for fsconfig() and the creds for
fsopen() can differ. We could restrict that they match by enforcing that
the creds of fsopen() and fsconfig() match but I don't see why that
needs to be the case and allows for a good delegation mechanism.

Plus, in the future it means we're able to extend overlayfs mount
options and allow users to specify layers via file descriptors instead
of paths:

        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower1", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower2", dirfd);

        /* append */
        fsconfig(FSCONFIG_SET_PATH{_EMPTY}, "lowerdir", "lower3", dirfd);

        /* clear all layers specified until now */
        fsconfig(FSCONFIG_SET_STRING, "lowerdir", NULL, 0);

This would be especially nice if users create an overlayfs mount on top
of idmapped layers or just in general private mounts created via
open_tree(OPEN_TREE_CLONE). Those mounts would then never have to appear
anywhere in the filesystem. But for now just do the minimal thing.

We should probably aim to move more validation into ovl_fs_parse_param()
so users get errors before fsconfig(FSCONFIG_CMD_CREATE). But that can
be done in additional patches later.

This is now also rebased on top of the lazy lowerdata lookup which
allows the specificatin of data only layers using the new "::" syntax.

The rules are simple. A data only layers cannot be followed by any
regular layers and data layers must be preceeded by at least one regular
layer.

Parsing the lowerdir mount option must change because of this. The
original patchset used the old lowerdir parsing function to split a
lowerdir mount option string such as:

        lowerdir=/lower1:/lower2::/lower3::/lower4

simply replacing each non-escaped ":" by "\0". So sequences of
non-escaped ":" were counted as layers. For example, the previous
lowerdir mount option above would've counted 6 layers instead of 4 and a
lowerdir mount option such as:

        lowerdir="/lower1:/lower2::/lower3::/lower4:::::::::::::::::::::::::::"

would be counted as 33 layers. Other than being ugly this didn't matter
much because kern_path() would reject the first "\0" layer. However,
this overcounting of layers becomes problematic when we base allocations
on it where we very much only want to allocate space for 4 layers
instead of 33.

So the new parsing function rejects non-escaped sequences of colons
other than ":" and "::" immediately instead of relying on kern_path().

Link: util-linux/util-linux#2287
Link: util-linux/util-linux#1992
Link: https://bugs.archlinux.org/task/78702
Link: https://lore.kernel.org/linux-unionfs/20230530-klagen-zudem-32c0908c2108@brauner
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NEEDINFO Need more information from reporter
Projects
None yet
Development

No branches or pull requests

6 participants
@karelzak @Andrei-Pozolotin @asavah @brauner @thesamesam and others