Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dir_cache sets d_type for only 1 request every 20 seconds #273

Closed
nh2 opened this issue Dec 7, 2021 · 2 comments
Closed

dir_cache sets d_type for only 1 request every 20 seconds #273

nh2 opened this issue Dec 7, 2021 · 2 comments

Comments

@nh2
Copy link

nh2 commented Dec 7, 2021

Hi, I suspect that something is off in the -o dir_cache=yes logic, so I'd like to check if it's intended/necessary:

When reading a directory's contents using the getdents64() syscall, sshfs returns d_type=DT_UNKNOWN most of the time, and returns a cached response (e.g. d_type=DT_REG) only once every 20 seconds.

That's the wrong way around regarding how a dirent cache usually works (caching directory entry types for 20 seconds, and "forgetting" them after that timeout).

Impact

When d_type=DT_UNKNOWN is returned (most of the time), programs have to call a stat() type function to figure out if a directory entry (dirent) is a file, directory, symlink, etc., which is very slow (doing 1 sequential syscall per file, instead of getting the info directly from a batch syscall like getdents()).

Reproducer

strace -fye getdents64 -v find /path/to/sshfs/mount -mindepth 1 -maxdepth 1 > /dev/null

Or even better, with timing to show the cache effect (default 20 seconds):

for i in {1..1000}; do echo; echo $i; echo; strace -fye getdents64 -v find /path/to/sshfs/mount -mindepth 1 -maxdepth 1 > /dev/null; sleep 1; done

I observe d_type=DT_UKNOWN most of the time, and another d_type= once every 20 loop iterations.

Environment

SSHFS version 3.7.1
FUSE library version 3.10.3
using FUSE kernel interface version 7.31
fusermount3 version: 3.10.3
@nh2 nh2 changed the title dir_cache seems to work the wrong way around, caching only 1 request every 20 seconds dir_cache sets d_type for only 1 request every 20 seconds Dec 7, 2021
@Nikratio
Copy link
Contributor

Nikratio commented Dec 7, 2021

Thanks for the report! I suspect this is not related to the cache, but due to the fact that SSHFS implements readdir() rather than readdirplus (cf. FUSE documentation), i.e. it does not provide stat() data at all. No idea why it would work every 20 seconds though.

In other words, the point of the cache is not to reduce the number of getattr() calls between the kernel and FUSE, but to reduce the number of SFTP requests between FUSE and the remote server.

Patches for readdirplus support are welcome :-).

@Nikratio
Copy link
Contributor

I'm closing this issue for now. Please note that this isn't meant to imply that you haven't found a real bug or worthwhile potential improvement - you most likely have and I'm grateful that you took the time to report it.

Unfortunately, this project does not currently have any active, regular contributors. As the maintainer, I try to review pull requests and make regular releases, but unfortunately I have no capacity to do significant development beyond that.
Issue reports that do not come with a pull request or clearly have high impact on a large number of users are therefore likely to languish.

I understand that this is frustrating for users, but I hope you can also understand that any development work that I do on this project has to compete with spending time with my family, doing work that I get paid for, doing something recreational without a computer, or working on features/bugs that affect me personally. Most bugs and ideas - unfortunately including this one - loose out in this competition.

In other words, unless you plan to work on this yourself or can recruit someone who will, it's unlikely that anyone is going to do anything about it anytime soon.

Still, you may wonder why I am closing the issue rather than keeping it open.

In short, I want the issue tracker to show the most important issues that users should be aware of and where prospective contributors could make the biggest difference. I do not think there is much value in using it as an exhaustive database of every idea or glitch that someone has ever encountered - especially if no one is intending to address/implement it.

For this reason, I am closing most issues when it becomes clear that they're unlikely to see any activity in the near future - and this seems to be the case here.

I understand that you have invested time and effort in reporting this, and I am very sorry that currently there is no way to build upon this. I wish the situation was different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants