Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple connections #132

Closed
wants to merge 4 commits into from
Closed

Support multiple connections #132

wants to merge 4 commits into from

Conversation

Nikratio
Copy link
Contributor

(This is a re-submit of PR #103 which was merged but then rolled back after tests failed)

Original description:

The -o max_conns=N option causes multiple SSH processes and response
processing threads to be created. Each file handle is locked to the
connection which was used for opening it. One-off requests can use any
connection.

The connection is chosen by checking the per-connection statistics:

Choose connection with least outstanding requests; if it's a tie,
choose connection with least directory handles; if it's a tie,
choose connection with least file handles; if it's a tie,
choose connection which has already been established.

The implementation assumes that the max_conns values will be small; it
uses linear search.

The primary purpose of this feature is to improve the responsiveness of a
file system during large file transfers. It also helps in scenarios
where the bandwidth of a single connection is limited, but multiple
connections are allowed, and the aggregate bandwidth limit is larger.

https://sourceforge.net/p/fuse/mailman/message/36076333/

tsavola and others added 4 commits October 15, 2017 16:52
The -o max_conns=N option causes multiple SSH processes and response
processing threads to be created.  Each file handle is locked to the
connection which was used for opening it.  One-off requests can use any
connection.

The connection is chosen by checking the per-connection statistics:

  1. Choose connection with least outstanding requests; if it's a tie,
  2. choose connection with least directory handles; if it's a tie,
  3. choose connection with least file handles; if it's a tie,
  4. choose connection which has already been established.

The implementation assumes that the max_conns values will be small; it
uses linear search.

The primary purpose of this feature is to improve the responsiveness of a
file system during large file transfers.  It also helps in scenarios
where the bandwidth of a single connection is limited, but multiple
connections are allowed, and the aggregate bandwidth limit is larger.
@Nikratio Nikratio self-assigned this Jun 26, 2018
@Nikratio
Copy link
Contributor Author

Nikratio commented Aug 2, 2018

@tsavola Did you have a chance too look at this? I'd really like to merge the feature.

@tsavola
Copy link
Contributor

tsavola commented Aug 2, 2018

I quickly looked at it. It seems that sshfs just dies in the test which builds using undefined behavior sanitizer. I couldn't replicate it on my workstation. No idea how to proceed.

@Nikratio
Copy link
Contributor Author

Nikratio commented Aug 5, 2018

I think you're looking at a different issue - for me it does not die and does not depend on using the sanitizer:

$ meson ..
The Meson build system
Version: 0.45.1
Source dir: /home/nikratio/in-progress/sshfs
Build dir: /home/nikratio/in-progress/sshfs/build
Build type: native build
Project name: sshfs
Native C compiler: cc (gcc 6.3.0 "cc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516")
Build machine cpu family: x86_64
Build machine cpu: x86_64
Message: Compiler warns about unused result even when casting to void
Program rst2man found: YES (/usr/bin/rst2man)
Configuring config.h using configuration
Found pkg-config: /usr/bin/pkg-config (0.29)
Native dependency fuse3 found: YES 3.1.0
Native dependency glib-2.0 found: YES 2.50.3
Native dependency gthread-2.0 found: YES 2.50.3
Program utils/install_helper.sh found: YES (/home/nikratio/in-progress/sshfs/utils/install_helper.sh)
Build targets in project: 4
Found ninja-1.7.2 at /usr/bin/ninja

$ ninja
[7/7] Linking target sshfs.

$ ./sshfs -f localhost:/tmp ~/tmp/mnt

and on a different console:

In [1]: import os
In [2]: TEST_DATA = open('/dev/urandom', 'rb').read(13726)
In [3]: fh = open('tmp/mnt/testfile', 'w+b', buffering=0)
In [4]: fd = fh.fileno()
    ...: fh.write(TEST_DATA)
    ...: fstat = os.fstat(fd)
    ...: size = fstat.st_size
    ...: assert size == len(TEST_DATA)
    ...:
    ...: # Add zeros at the end
    ...: os.ftruncate(fd, size + 1024)
    ...: assert os.fstat(fd).st_size == size + 1024
    ...: fh.seek(0)
    ...: assert fh.read(size) == TEST_DATA
    ...: assert fh.read(1025) == b'\0' * 1024
    ...: 
    ...: # Truncate data
    ...: os.ftruncate(fd, size - 1024)
    ...: assert os.fstat(fd).st_size == size - 1024
    ...: fh.seek(0)
    ...: assert fh.read(size) == TEST_DATA[:size-1024]
AssertionError                            Traceback (most recent call last)
<ipython-input-19-f34e4fff1833> in <module>()
      5 fstat = os.fstat(fd)
      6 size = fstat.st_size
----> 7 assert size == len(TEST_DATA)
      8 
      9 # Add zeros at the end

AssertionError: 

...and sshfs is still running happily:

In [5]: fh.write(b'foo')
Out[5]: 3

Does this really work for you?

@Nikratio
Copy link
Contributor Author

friendly ping. Do you think you'll have time to look at this at some point?

@svenstaro
Copy link

Would absolutely love to see this merged. This is a big pain point for me whenever I use sshfs. We could then use file transfer programs that use multithreading in order to make use of multiple parallel connections, thereby increasing total transfer speed. This latter point is especially a problem when using sshfs over internet.

@tsavola
Copy link
Contributor

tsavola commented Sep 10, 2018

I'll try to look at this this week.

@tsavola
Copy link
Contributor

tsavola commented Nov 8, 2018

That was one long week. I'm sorry about that.

I have identified the problem.

The implementation is supposed to use the same connection for all operations on the same file handle, but the write and fstat calls get executed on different connections. It works if the write is synchronous, but it's asynchronous by default.

The reason why the operations use different connections is that FUSE calls sshfs_getattrs with NULL file info. libfuse's documentation for getattr says "fi will always be NULL if the file is not currently open, but may also be NULL if the file is open." When looking at tst_truncate_fd's debug output, getattr's file info is non-NULL in at least one call, but it's NULL in most calls.

I got the truncation tests working by creating a hash table which maps paths to sshfs_files, but the tst_open_unlink fails if I don't also track unlinks and renames. That's doable, but soon gets complicated. It also doesn't solve the problem of choosing the correct file handle if there are several open with the same path. My quick-and-dirty hack is at https://github.com/tsavola/sshfs/tree/parallel_conns-on-3.5.0.

I have to give it some more though, but currently I don't see a solution other than forcing -o sshfs_sync with parallel connections, or getting FUSE to somehow indicate the correct file handle.

@Nikratio
Copy link
Contributor Author

Nikratio commented Nov 9, 2018

Thanks for looking into this! That sounds like a tricky problem indeed. Having FUSE always provide a filehandle would be a very nice solution, but unfortunately exactly that was discussed on the kernel mailing list a few months ago and met strong opposition. As I understand, it would require major changes in the kernel.

If you can't find a good solution, there's also the option of simply documenting this behavior prominently. As long as we understand what's happening and why, having mtime's lag a little even after a write has completed is no worse than eg. the lack of hardlink support or rename() not being able to overwrite the destination.

@tsavola
Copy link
Contributor

tsavola commented Nov 9, 2018

The tests fail because st_size is incorrect because the write hasn't happened yet. I think that's a more subtle breaking of a file descriptor's invariant, than the unsupported features you mention.

@tsavola
Copy link
Contributor

tsavola commented Nov 20, 2018

I wonder if it's possible to get the id of the thread (or something resembling that) which requests the FUSE operation?

@Nikratio
Copy link
Contributor Author

Would that help? I think you'd just make the problem a little more difficult to discover, because it would appear when write() and getattr() get called from different threads.

@Nikratio
Copy link
Contributor Author

Could you possibly force the same connection for each inode (rather than file handle)?

@tsavola
Copy link
Contributor

tsavola commented Nov 20, 2018

I suppose so. It probably wouldn't hurt real-world performance either. (How often do you do parallel transfers targetting the same file?)

@Nikratio
Copy link
Contributor Author

Any update?

@Nikratio Nikratio force-pushed the master branch 2 times, most recently from 382bb0f to d299217 Compare February 27, 2019 21:27
@Nikratio
Copy link
Contributor Author

Nikratio commented Nov 3, 2019

I am working on this now, based on the pull request.

@Nikratio Nikratio closed this Nov 3, 2019
@Nikratio Nikratio deleted the parallel_conns branch November 3, 2019 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants