Support multiple connections #132

Nikratio · 2018-06-26T10:49:12Z

(This is a re-submit of PR #103 which was merged but then rolled back after tests failed)

Original description:

The -o max_conns=N option causes multiple SSH processes and response
processing threads to be created. Each file handle is locked to the
connection which was used for opening it. One-off requests can use any
connection.

The connection is chosen by checking the per-connection statistics:

Choose connection with least outstanding requests; if it's a tie,
choose connection with least directory handles; if it's a tie,
choose connection with least file handles; if it's a tie,
choose connection which has already been established.

The implementation assumes that the max_conns values will be small; it
uses linear search.

The primary purpose of this feature is to improve the responsiveness of a
file system during large file transfers. It also helps in scenarios
where the bandwidth of a single connection is limited, but multiple
connections are allowed, and the aggregate bandwidth limit is larger.

https://sourceforge.net/p/fuse/mailman/message/36076333/

The -o max_conns=N option causes multiple SSH processes and response processing threads to be created. Each file handle is locked to the connection which was used for opening it. One-off requests can use any connection. The connection is chosen by checking the per-connection statistics: 1. Choose connection with least outstanding requests; if it's a tie, 2. choose connection with least directory handles; if it's a tie, 3. choose connection with least file handles; if it's a tie, 4. choose connection which has already been established. The implementation assumes that the max_conns values will be small; it uses linear search. The primary purpose of this feature is to improve the responsiveness of a file system during large file transfers. It also helps in scenarios where the bandwidth of a single connection is limited, but multiple connections are allowed, and the aggregate bandwidth limit is larger.

Nikratio · 2018-08-02T07:17:48Z

@tsavola Did you have a chance too look at this? I'd really like to merge the feature.

tsavola · 2018-08-02T19:36:50Z

I quickly looked at it. It seems that sshfs just dies in the test which builds using undefined behavior sanitizer. I couldn't replicate it on my workstation. No idea how to proceed.

Nikratio · 2018-08-05T09:22:25Z

I think you're looking at a different issue - for me it does not die and does not depend on using the sanitizer:

$ meson ..
The Meson build system
Version: 0.45.1
Source dir: /home/nikratio/in-progress/sshfs
Build dir: /home/nikratio/in-progress/sshfs/build
Build type: native build
Project name: sshfs
Native C compiler: cc (gcc 6.3.0 "cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516")
Build machine cpu family: x86_64
Build machine cpu: x86_64
Message: Compiler warns about unused result even when casting to void
Program rst2man found: YES (/usr/bin/rst2man)
Configuring config.h using configuration
Found pkg-config: /usr/bin/pkg-config (0.29)
Native dependency fuse3 found: YES 3.1.0
Native dependency glib-2.0 found: YES 2.50.3
Native dependency gthread-2.0 found: YES 2.50.3
Program utils/install_helper.sh found: YES (/home/nikratio/in-progress/sshfs/utils/install_helper.sh)
Build targets in project: 4
Found ninja-1.7.2 at /usr/bin/ninja

$ ninja
[7/7] Linking target sshfs.

$ ./sshfs -f localhost:/tmp ~/tmp/mnt

and on a different console:

In [1]: import os
In [2]: TEST_DATA = open('/dev/urandom', 'rb').read(13726)
In [3]: fh = open('tmp/mnt/testfile', 'w+b', buffering=0)
In [4]: fd = fh.fileno()
    ...: fh.write(TEST_DATA)
    ...: fstat = os.fstat(fd)
    ...: size = fstat.st_size
    ...: assert size == len(TEST_DATA)
    ...:
    ...: # Add zeros at the end
    ...: os.ftruncate(fd, size + 1024)
    ...: assert os.fstat(fd).st_size == size + 1024
    ...: fh.seek(0)
    ...: assert fh.read(size) == TEST_DATA
    ...: assert fh.read(1025) == b'\0' * 1024
    ...: 
    ...: # Truncate data
    ...: os.ftruncate(fd, size - 1024)
    ...: assert os.fstat(fd).st_size == size - 1024
    ...: fh.seek(0)
    ...: assert fh.read(size) == TEST_DATA[:size-1024]
AssertionError                            Traceback (most recent call last)
<ipython-input-19-f34e4fff1833> in <module>()
      5 fstat = os.fstat(fd)
      6 size = fstat.st_size
----> 7 assert size == len(TEST_DATA)
      8 
      9 # Add zeros at the end

AssertionError:

...and sshfs is still running happily:

In [5]: fh.write(b'foo')
Out[5]: 3

Does this really work for you?

Nikratio · 2018-08-28T13:49:02Z

friendly ping. Do you think you'll have time to look at this at some point?

svenstaro · 2018-09-08T17:53:33Z

Would absolutely love to see this merged. This is a big pain point for me whenever I use sshfs. We could then use file transfer programs that use multithreading in order to make use of multiple parallel connections, thereby increasing total transfer speed. This latter point is especially a problem when using sshfs over internet.

tsavola · 2018-09-10T13:40:34Z

I'll try to look at this this week.

tsavola · 2018-11-08T22:56:33Z

That was one long week. I'm sorry about that.

I have identified the problem.

The implementation is supposed to use the same connection for all operations on the same file handle, but the write and fstat calls get executed on different connections. It works if the write is synchronous, but it's asynchronous by default.

The reason why the operations use different connections is that FUSE calls sshfs_getattrs with NULL file info. libfuse's documentation for getattr says "fi will always be NULL if the file is not currently open, but may also be NULL if the file is open." When looking at tst_truncate_fd's debug output, getattr's file info is non-NULL in at least one call, but it's NULL in most calls.

I got the truncation tests working by creating a hash table which maps paths to sshfs_files, but the tst_open_unlink fails if I don't also track unlinks and renames. That's doable, but soon gets complicated. It also doesn't solve the problem of choosing the correct file handle if there are several open with the same path. My quick-and-dirty hack is at https://github.com/tsavola/sshfs/tree/parallel_conns-on-3.5.0.

I have to give it some more though, but currently I don't see a solution other than forcing -o sshfs_sync with parallel connections, or getting FUSE to somehow indicate the correct file handle.

Nikratio · 2018-11-09T10:39:05Z

Thanks for looking into this! That sounds like a tricky problem indeed. Having FUSE always provide a filehandle would be a very nice solution, but unfortunately exactly that was discussed on the kernel mailing list a few months ago and met strong opposition. As I understand, it would require major changes in the kernel.

If you can't find a good solution, there's also the option of simply documenting this behavior prominently. As long as we understand what's happening and why, having mtime's lag a little even after a write has completed is no worse than eg. the lack of hardlink support or rename() not being able to overwrite the destination.

tsavola · 2018-11-09T23:18:53Z

The tests fail because st_size is incorrect because the write hasn't happened yet. I think that's a more subtle breaking of a file descriptor's invariant, than the unsupported features you mention.

tsavola · 2018-11-20T20:11:38Z

I wonder if it's possible to get the id of the thread (or something resembling that) which requests the FUSE operation?

Nikratio · 2018-11-20T21:33:49Z

Would that help? I think you'd just make the problem a little more difficult to discover, because it would appear when write() and getattr() get called from different threads.

Nikratio · 2018-11-20T21:35:20Z

Could you possibly force the same connection for each inode (rather than file handle)?

tsavola · 2018-11-20T22:01:06Z

I suppose so. It probably wouldn't hurt real-world performance either. (How often do you do parallel transfers targetting the same file?)

Nikratio · 2018-12-22T14:21:23Z

Any update?

Nikratio · 2019-11-03T13:32:03Z

I am working on this now, based on the pull request.

tsavola and others added 4 commits October 15, 2017 16:52

Don't declare loop variable in for initialization

b2d7c52

Merge branch 'parallel' of git:https://github.com/tsavola/sshfs

253dbde

Added test and docs for new max_conns option.

5ed3303

Nikratio self-assigned this Jun 26, 2018

Nikratio force-pushed the master branch 2 times, most recently from 382bb0f to d299217 Compare February 27, 2019 21:27

Nikratio closed this Nov 3, 2019

Nikratio deleted the parallel_conns branch November 3, 2019 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple connections #132

Support multiple connections #132

Nikratio commented Jun 26, 2018

Nikratio commented Aug 2, 2018

tsavola commented Aug 2, 2018

Nikratio commented Aug 5, 2018

Nikratio commented Aug 28, 2018

svenstaro commented Sep 8, 2018

tsavola commented Sep 10, 2018

tsavola commented Nov 8, 2018

Nikratio commented Nov 9, 2018

tsavola commented Nov 9, 2018

tsavola commented Nov 20, 2018

Nikratio commented Nov 20, 2018

Nikratio commented Nov 20, 2018

tsavola commented Nov 20, 2018

Nikratio commented Dec 22, 2018

Nikratio commented Nov 3, 2019

Support multiple connections #132

Support multiple connections #132

Conversation

Nikratio commented Jun 26, 2018

Nikratio commented Aug 2, 2018

tsavola commented Aug 2, 2018

Nikratio commented Aug 5, 2018

Nikratio commented Aug 28, 2018

svenstaro commented Sep 8, 2018

tsavola commented Sep 10, 2018

tsavola commented Nov 8, 2018

Nikratio commented Nov 9, 2018

tsavola commented Nov 9, 2018

tsavola commented Nov 20, 2018

Nikratio commented Nov 20, 2018

Nikratio commented Nov 20, 2018

tsavola commented Nov 20, 2018

Nikratio commented Dec 22, 2018

Nikratio commented Nov 3, 2019