-
-
Notifications
You must be signed in to change notification settings - Fork 729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast writes, slow reads with sshfs #664
Comments
In another ticket, there was the idea to separate compaction from archive creation. |
Any guess on why the read is taking so long (<100 kbps). This isn't a general feature of sshfs---generally, I can read just fine and with decent speed. |
Maybe it depends on read size? If there are small reads, overhead and latency might add up. |
I'm not sure that can be the issue. In my comment above, the statement:
represents how long a single call to Also, I can do a read over an sshfs mount without using borg at all for both small and large files at much much higher bit rates. There is something fishy going on for sure, but I'm not sure what I should be doing to debug this, especially since I've narrowed it down to a single Python statement. |
I've looked at the source and found this:
The whole overhead could maybe be optimized by reading the whole segment into a BytesIO object and then reading from that (from memory). |
Could you try to change the open() call in get_fd() to:
Then test again to see if it makes any difference. |
Can you try this until saturday noon? I'ld like to create 1.0.0rc2 at the weekend and if above helps, I'ld like to include it. |
I will try tonight and report back here. |
So something external (ISP or host for storage) has changed. Things are much faster right now. I tried with and without your modification and the timings were about the same, but in both cases, things were pretty quick and definitely faster than before. So I'd have to do a much longer/larger test to know if this really makes a difference, and unfortunately, I will be unable to do so until tomorrow. I'll leave it to you to decide if you want to include it. |
Here I've attached two files, with and without the change you suggested. Previously, the checkpoint that I make happen with "-c 150" was taking forever...but now it is super quick. I terminate the backups after the first checkpoint. In both of these tests, I am throttling the speed with trickle. |
Hmm, this is not the scenario you had when you filed your issue, right? |
To summarize the data_with vs. data_without (too much data in there): without: checkpoint writing takes 16s |
You have glibc 2.9? I'ld have expected 2.19. |
I didn't read your first comment at first. OK, if it can't be reproduced / determined right now, I'll keep the code as is for now. |
Regarding glibc 2.9. Could this just be the version that Python was compiled with? I installed Python 3.4 for Ubuntu using apt-get, and while it is true that
In other words, it's not clear to me how relevant the output of |
Here's a sceencapture of nload during checkpointing. You can see that there are relatively large chunks of time where no reading at all is happening. I'm working on getting some more information on this right now. |
I was able to do a more thorough test. I have a directory consisting of five 100 MiB files:
So it takes about 5 minutes to scp these over. I then ran borg with and without the buffer argument to the During check-pointing, a read command is sent to the remote server (due to sshfs). These read commands are blocking and extremely slow. The attached log files demonstrate the slow speed, but it sits somewhere around 10-100 KiB/sec. This is an average time. The nload excerpt I attached above shows that the transfers are not sustained but happen in small chunks instead. So it's probably closer to 1 MiB/sec for a short burst and then 0 MiB/sec for a long pause and then the burst happens again. Also, the blocking test_20160228_with.txt Aside: I tried a much larger test on Saturday as I left for the day. I did not check the backup until 16+ hours later, but it still had not moved past the first 4 hours of progress. It was doing "checkpointing" and so my guess is that it was the blocking read command that was stalled. |
You used scp to write to remote. Can you check cp to read from remote (via sshfs)? |
just searched for "sshfs slow": |
Ahh. I think you've pinpointed the issue. It is definitely sshfs.
But using rsync over sshfs, it is quite a bit slower...much closer to the observed 10 kiB/second. I don't expect that it will finish anytime soon:
|
ok, so this is an issue with sshfs and not borg. shall we add an faq entry about network filesystems and their weirdnesses? |
Yeah sounds good. Am I the only one experiencing this over sshfs? If it is affecting everyone, it pretty much means that you cannot use sshfs at all with borg. So you have to backup locally and scp over, or you need to have borg available remotely. Both of those situations mean I cannot use borg right now. I'll aim to see if I can resolve my sshfs issues, but it would be nice to know if others experience the same issue. |
I used to regularly use attic to backup local files to a remote repo accessed via sshfs. It was of course slower than having attic run close to the remote disk, but it was still usable. Sometimes it was over a home wifi network, other times over a gigabit network. |
btw, in a comment (I linked to it above) a guy points out that sshfs got slow on ubuntu 12.04 and was much faster before. so maybe it is just that some sshfs versions are slow? |
For the record, I was mostly using Ubuntu 14.04 (sshfs 2.5-1ubuntu1, openssh-client 1:6.6p1-2ubuntu2.6) for the client and Debian wheezy (openssh-server 1:6.0p1-4+deb7u3) on the server. |
Great data point. I'll continue looking into this and report back here sometime later this week. |
I have added a generic section to the faq and will close this now as the issue is not within borg. |
document how to debug / file filesystem issues, fixes #664
issue in sshfs issue tracker: libfuse/sshfs#11 |
@chebee7i nikolaus rath has issue reproducing the slowness, can you discuss that with him? See link in previous comment. |
This is a continuation of discussion that began in #660.
I am trying to create an archive, but the process slows after handling a few large files. Borg's state is S+ and it appears to be waiting for something. In one case, I let the wait continue for greater than 4 hours and it did not resolve itself. Ctrl-C does nothing to ameliorate the issue. kill and kill -9 have no effect. Afterwards, the process is in state D+. Restarting is the only way to stop the process.
Update:
There are no issues if I backup to a local location, with and without compression. For remote, I have no choice but to use sshfs. When the backup begins, each file is chunked and sent over the wire. This proceeds at a reasonable speed. Things slow down whenever
Archive.save()
is called. In this method,Repository.commit()
is called. What happens here is that the segments are compacted. Each segment has a number of objects associated to it. So for each segment, we iterate over its objects viaLoggedIO.iter_objects()
. There is a while loop in there and the first line of that while loop is a call toLoggedIO._read()
. Inside that method, there is a line:data = fd.read(length)
. This is what is stalling, and presumably, is the source of the high %wa values. I can see withnload
that there are very regular downloads of data every ~12 seconds, but they are in extremely small chunks and the download interval is far less than 1 second. To give a sense of the slowdown for two object reads, I logged:This seems quite odd. Am I hitting a limitation of sshfs? It seems like it'd be better to do this completely local and then rsync the archive over.
Example: I capped my transfer rate to 5 mbps and tried to send 571 MiB of data remotely. This took 40 minutes. Of that 40 minutes, roughly 16 were spent sending the chunks over the wire and 18 minutes were spent inside the
LoggedIO._read()
method caused byArchive.save()
.The text was updated successfully, but these errors were encountered: