Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the impact of latency on Libtorrent thoughput #2620

Open
synctext opened this issue Nov 15, 2016 · 189 comments
Open

Understanding the impact of latency on Libtorrent thoughput #2620

synctext opened this issue Nov 15, 2016 · 189 comments

Comments

@synctext
Copy link
Member

synctext commented Nov 15, 2016

Problem: Tor-like tunnels introduce significant latency.

Measure how 25ms to 1000ms of latency affects the Libtorrent throughput. Aim to understand how LEDBAT parameters can be set to maximize throughput. Create an experimental environment using containers. Use netem to create latency between two containers. Start seeder at one container, download across the connecting bridge with added latency.

Initial results: performance is severly efected by latency of just 50 ms.

@synctext
Copy link
Member Author

Current operational measurement scripts: https://github.com/vandenheuvel/libtorrent-latency-benchmark

@MaxVanDeursen
Copy link
Contributor

MaxVanDeursen commented Nov 15, 2016

averages

Measurements of the libtorrent download speed under different latencies with 6 seeders.

@vandenheuvel
Copy link

vandenheuvel commented Nov 15, 2016

This is relevant for achieving onion routing with defence against traffic confirmation attacks.

@synctext
Copy link
Member Author

synctext commented Nov 15, 2016

ToDo for next meet, from Libtorrent API docs + tuning:

  • Understand performance tuning and experiment with various settings
  • Compile from source
  • session_settings high_performance_seed();
  • outstanding_request_limit_reached
  • send_buffer_watermark_too_low
  • Directly add a test for latency sensitivity in Libtorrent existing software emulation suite?

@vandenheuvel
Copy link

vandenheuvel commented Nov 15, 2016

Investigate max_out_request_queue, and here.

@synctext
Copy link
Member Author

synctext commented Nov 16, 2016

Background http:https://blog.libtorrent.org/2015/07/slow-start/ and http:https://blog.libtorrent.org/2011/11/requesting-pieces/
Easy fix: high_performance_seed returns settings optimized for a seed box, serving many peers and that doesn't do any downloading. It has a 128 MB disk cache and has a limit of 400 files in its file pool. It support fast upload rates by allowing large send buffers.

Additional boost: asynchronous disk I/O

Detailed solution : I’m unable to get more than 20Mbps with a single peer on a 140ms RTT link (simulated delay with no packet loss)..
Original post

things you could adjust according to Arvid Norberg lead engineer of the libtorrent project.

“Did you increase the socket buffer sizes on both ends?”

int recv_socket_buffer_size;
int send_socket_buffer_size;
“There’s also buffer sizes at the bittorrent level:”

int send_buffer_low_watermark;
int send_buffer_watermark;
int send_buffer_watermark_factor;
“And there are buffers at the disk layer:”

int max_queued_disk_bytes;
int max_queued_disk_bytes_low_watermark;

@vandenheuvel
Copy link

vandenheuvel commented Nov 24, 2016

New test using LXC's. Ten seeding LXC's, one downloading LXC. Single measurement. While high latencies seem to start slower, high latencies seem to do substantially better once transfer speed has stabilized. Latencies up to 150 ms perform, at maximum speed, similar to the base test without any latency. Measurement without any latency is very similar to earlier test using VMs.
result

@vandenheuvel
Copy link

A single seeding LXC. Single measurement. Higher latencies impact throughput heavily. 1

@synctext
Copy link
Member Author

synctext commented Dec 1, 2016

ToDo, try to obtain more resilience against latency in Libtorrent with single seeder, single leecher. Plus read current research on traffic correlation attacks. The basics are covered here. Quote: 'recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side'.

@synctext
Copy link
Member Author

synctext commented Dec 15, 2016

Strange observation that it takes 60 to 100 seconds for speed to pick up.
is the seeder side the bottleneck, due to anti-freeriding stuff?
Please repeat multiple times and create boxplots.

1 iteration to find the magic bottleneck...

@vandenheuvel
Copy link

vandenheuvel commented Dec 15, 2016

Good news: It appears that the magic bottleneck is identified. Plot of single seeder, single leecher, 200ms latency. No reordering.
default
Throughput is mostly 15MB/s. Now with doubling the default and max parameters of net.ipv4.tcp_rmem and net.ipv4.tcp_wmem (measured in bytes):
double_default_max_plain
We notice that throughput doubles also, to roughly 30MB/s. The bad news, however, is that further increasing these parameters has little effect: most of throughput speeds never pass the 35MB/s.
Also, the inconsistency in these measurements is still unexplained.

@synctext
Copy link
Member Author

good! next bottleneck...

@synctext
Copy link
Member Author

synctext commented Jan 10, 2017

The magic parameter setting are discovered now, resulting in 35MByte/s Libtorrent throughput.
Next steps are to advance this throughput in Tribler and Tribler+tor-like circuits.

20170110_144810

Please document your LXC containers.
ToDo: Tribler 1 seeder, 1 leecher; see influence of blocking SQLite writes on performance...

@qstokkink understands the details of tunnel community...

@qstokkink
Copy link
Contributor

You will probably want a repeatable experiment using the Gumby framework. You are in luck, I created a fix for the tunnel tests just a few weeks ago: https://github.com/qstokkink/gumby/tree/fix_hiddenseeding . You can use that branch to create your own experiment, extending the hiddenservices_client.py HiddenServicesClient with your own experiment and your own scenario file (1 seeder, 1 leecher -> see this example for inspiration).

Once you have the experiment running (which is a good first step before you start modifying things - you will probably run into missing packages/libraries etc.), you can edit the TunnelCommunity class in the Tribler project.

If you want to add delay to:

  • Intermittent relaying nodes, add delay here
  • Exit nodes, add delay here
  • The sending node, add delay here

You can combine relaying nodes and the sending node into one by adding delay here (which does not include the exit node)

@synctext
Copy link
Member Author

synctext commented Jan 13, 2017

1 Tribler seed + 1 Tribler leecher with normal Bittorrent with 200ms - 750ms latency is limited by congestion control. Read the details here. To push Tribler towards 35MB/s with added Tor-like relays we probably at some point this year need to see the internal state of the congestion control loop.

ToDo for 2017: measure congestion window (cwnd) statistics during hidden seeding.

@vandenheuvel
Copy link

vandenheuvel commented Jan 25, 2017

We managed to do some new tests. We ran Tribler with the only the http API and libtorrent enabled. We found that the performance of libtorrent within Tribler is significantly worse than plain libtorrent. Below is a summary of results so far. Note that these values for a single seeder single leecher test.

libtorrent Tribler
no latency ~160 MB/s 30 - 100 MB/s
200 ms ~15 MB/s ~2.5 MB/s
200 ms + magic ~35 MB/s ~2.5 MB/s

Note that "magic" is the increasing of net.ipv4.tcp_rmem and net.ipv4.tcp_wmem parameters. It appears that Tribler suffers from a different bottleneck. Note that when testing without latency, speed varies heavily between ~30 MB/s and ~100 MB/s. During the all tests, cpu load was ~25% on 3 cores.
zerolatency4gb

200mslatency4gbmagic
@synctext @devos50 @qstokkink does anyone have any ideas what might cause this?

@qstokkink
Copy link
Contributor

@vandenheuvel Assuming all libtorrent versions are the same etc.: the only way Tribler interfaces with a libtorrent download is by retrieving its stats every second (handle.stats()) and by handling alerts.

After writing that I found this in the libtorrent manual:

Note

these calls are potentially expensive and won't scale well with lots of torrents. If you're concerned
about performance, consider using post_torrent_updates() instead.

Even though this shouldn't be that bad, you could try and write a loop which gets the torrent handle status every second on your naked experiment and see how that affects things

@devos50
Copy link
Contributor

devos50 commented Jan 26, 2017

@vandenheuvel in addition, we are also processing all libtorrent alerts every second but I don't think this leads to much overhead actually. Could you try to disable the alert processing (by commenting out this line: https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L72)?

@synctext
Copy link
Member Author

synctext commented Feb 1, 2017

Very impressive work guys:

libtorrent Tribler
no latency ~160 MB/s 30 - 100 MB/s
200 ms ~15 MB/s ~2.5 MB/s
200 ms + magic ~35 MB/s ~2.5 MB/s

Tribler shamefully collapses. Clearly something to dive into! Did the tip of fully disabling the stats+perhaps 5 second stats sampling lead to any results? btw, can you also expand this Tribler pain table with 10, 25, and 75ms latency data points?

@MaxVanDeursen
Copy link
Contributor

MaxVanDeursen commented Feb 13, 2017

Our test results show a download speed of ~2.5 MB/s at 200 ms with our plain script as well, when we introduce the EPoll reactor into the code. This is similar to the results found in the previous tests with Tribler. However, tests with our plain script and Select reactor shows the original results that we have retrieved before introducing the reactor or even higher: a top speed of 30 MB/s.
The next thing on our list is testing Tribler through twisted with the Select reactor.

@qstokkink
Copy link
Contributor

Interesting. Could you also try the normal Poll reactor (it is supposed to be faster than Select for large socket counts)?

@vandenheuvel
Copy link

vandenheuvel commented Feb 13, 2017

Strange enough, results are quite different between our script and Tribler. Summary:

EPollReactor SelectReactor PollReactor
script 2.5 MB/s 32 MB/s 16 MB/s
Tribler 2.5 MB/s 2.5 MB/s 2.5 MB/s

It may take a little while for the download to come up to speed (~ 60 seconds), but after that the throughput is quite steady.

Our next step will be profiling.

@synctext
Copy link
Member Author

synctext commented Feb 14, 2017

32MByte / sec. So.. Python3 and 200ms latency results. This facinating mistery deepens.
Please make a profile print with human readable threadname printouts

@vandenheuvel
Copy link

We just ran our script under both python2.7 and python3.5, this made no difference for the SelectReactor.

@MaxVanDeursen
Copy link
Contributor

MaxVanDeursen commented Feb 15, 2017

Due to an update for lxc our test results have changed drastically. The newest test results, using a latency of 200 ms except otherwise mentioned are:

No Reactor without delay No Reactor
Script ~400 MB/s ~32 MB/s

All the below results are created by a modified script as well (200 ms):

EPollReactor SelectReactor PollReactor
Inactive ~32 MB/s ~16 MB/s ~16 MB/s
Semi-Active ~32 MB/s ~16 MB/s ~16 MB/s

Notes:

  • The EPollReactor and PollReactor only reach their speed after a certain period of time.

@synctext
Copy link
Member Author

hmmm. so the lower non-script table is all Tribler?

@vandenheuvel
Copy link

In the above post, all results are our own script. We retested everything non-Tribler. We're not sure what this change of results (especially the No Reactor without delay peek performance and EPollReactor results) tells us about the quality of current testing method... These changes are enormous.

@vandenheuvel
Copy link

vandenheuvel commented Feb 18, 2017

EPollReactor SelectReactor PollReactor
Tribler 0 ms ~100 MB/s ~130 MB/s ~80 MB/s
Tribler 200 ms 2.5 MB/s 2.5 MB/s 2.5 MB/s

PollReactor without latency was varying wildly, the other measurements were steady. Sadly these results agree with previous results, before the LXC update. We will now try to bring our script and Tribler closer together by using the reactor thread to start libtorrent in our script.

@ichorid
Copy link
Contributor

ichorid commented Mar 13, 2018

In the complete torture scenario of 600+/-200ms (normal dist), with default settings, BW never does more than 0,03MBytes/s. However, when I set target_delay to 400ms, it goes to sweet 1,1-1,9 MBytes/s. TCP does 3,3-3,8 Mbytes/s in that scenario. So, I guess this uTP performance is the best thing we could ask for.

Now, the question is: should we try to modify libtorrent's LEDBAT implementation to be able to adapt to such harsh conditions automatically, or we could just do away with setting target_delay high enough in Tribler?

@arvidn
Copy link

arvidn commented Mar 13, 2018

If you expect +/- 200ms of non-congestive delay, I think you should raise the target delay in Tribler. But what would you expect to cause such delay if it isn't congestion? It wouldn't be enough for the tunnel implementation to be inefficient, as it would (presumably) be consistently inefficient.

@arvidn
Copy link

arvidn commented Mar 13, 2018

in the previous post, your description of how the congestion window is adjusted is correct. That's how it's supposed to work. I don't think there are any problems with it, but it sounds like you do. If so, would you mind elaborating on which behavior you think is wrong?

Another (I think simpler) way of looking at that formula is this:

  1. Each round trip, the cwnd i increased or decreased by no more than gain_factor (which I think is somewhere around 3000 bytes)
  2. Whether it's increased or decreased, and by how much, depends on the signed difference between the target delay and the current one-way delay. i.e. when the delay is above target, we decrease cwnd and vice versa. This is called delay_factor.
  3. This adjustment is supposed to happen once per round-trip. This is similar to how TCP adjusts its cwnd, and ensures that we don't adjust it faster than we can get feedback from the other end. Instead of using the time-domain for this, a more accurate and predictable metric is to consider all bytes-in-flight as one round-trip, and the number of bytes that were just ACKed by this message, the portion of the RTT this cwnd adjustment represents. So the acked-bytes / bytes-in-flight is therefore called window_factor.

The final change to cwnd is gain_factor * delay_factor * window_factor.

Now, this is made a bit more complicated by the slow-start logic in there, as well as the logic to detect whether the sender is not saturating the current cwnd, in which case we don't keep growing it indefinitely.

@ichorid
Copy link
Contributor

ichorid commented Mar 13, 2018

@arvidn, I expect a widely variable delay caused by other connections maintained by intermediate peers. I would not expect latency added by these intermediate peers to have the distribution similar to that of the typical Internet router. These peers are regular PCs or seedboxes, with their connection always filled to the point of congestion, ruled by quite un-sophisticated queue management algorithms. Besides, there are always 2-3 of them in the way, producing a superposition of latency distributions. So, I assume the (almost) worst possible case - the normal distribution.

As was noted by @shalunov, to be sure, I should get the distribution from a real tunnel connection on Tribler.

@ichorid
Copy link
Contributor

ichorid commented Mar 13, 2018

@arvidn, regarding your question on LEDBAT scaled_gain formula, I'm not ready to answer it yet.
A brief mathematical analysis of it showed that the formula could produce very unstable behavior when delay-target_delay difference becomes large enough. This is a result of having a feedback loop, introduced in the form of dependence on bytes_in_flight.

Again, I'm no expert on differential equations and stability analysis, so I need to recheck everything several times.

@ichorid
Copy link
Contributor

ichorid commented Mar 20, 2018

These are plots from several experiments running Tribler -> client_test connection with different number of hops.

Single hop, our exit node under load:
utp out0x7f2648005a30 delays
utp out0x7f2648005a30-uploading
utp out0x7f2648005a30-their_delay
utp out0x7f2648005a30-our-delay

Three hops:
utp out0x7f6388005910 delays
utp out0x7f6388005910-our-delay
utp out0x7f6388005910-their_delay
utp out0x7f6388005910-uploading

@ichorid
Copy link
Contributor

ichorid commented Mar 20, 2018

From these plots, it is obvious that we are not limited by the window size. Instead, we are limited by packet loss and overall instability of connection (and probably by buffer bloat).
TCP-style protocol's performance is a product of bandwidth, (inverted) latency, and (inverted) packet loss. The problem is:

  1. the bandwidth to the target peer is bottlenecked by the smallest bandwidth in the circuit;
  2. the latency is the sum of all latencies in the circuit, amplified by buffer bloat effects.
  3. the packet loss probability is the product of loss probabilities in the circuit.

TCP performance
TCP

Traditional congestion control and error correction do not work in these circumstances, and we are not going to invent our own.

Instead, we can utilize Tribler peer network to simultaneously create several circuits to a single peer, for a single download. We could gradually create new circuits/connections until the download speed stops growing: that would signal that either leecher's or seeder's uplink bandwidth is saturated.
single-multe-circuits_tribler 1

@synctext
Copy link
Member Author

synctext commented Mar 21, 2018

latency is the sum of all latencies in the circuit, amplified by buffer bloat effects.

Great progress again. Solid performance plots. With tokens we will move to online resource utilization of below 10%! There is strong evidence for that, please ensure to understand Figure 6 in our measurement study. Relays, seeds and exit nodes will be mostly idle when they demand payments. No premature optimizations please.

Slowly we're are getting to the heart of the problems and fixing them. This is not a buffer bloat problem I believe, but a CPU overload problem. Just saw a "htop" of these boxes. We lack load management, rejection of excess request or anything at the exit nodes. These boxes are moving TeraBytes per day and are chronically overloaded, leading to slow servicing of requests. A simple heuristic of rejecting new circuits when overloaded (cpu >85%, core pegging) is likely to dramatically clear up everything. But first we need that token economy to work..

@synctext
Copy link
Member Author

math model of oversupply economy: https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.2856

@shalunov
Copy link

That’s pretty heavy congestive+processing delay. The LEDBAT algorithm seems to correctly respond by slowing down as designed. Throwing more traffic at the overloaded relay nodes (with whatever mechanism—parameter adjustment or not) would not make them have more capacity. I would leave the parameters at defaults and figure out where the delays are coming from and why.

@Seeker2
Copy link

Seeker2 commented Feb 23, 2021

Good news!
After I pointed out in my testing...
arvidn/libtorrent#3542 (comment)
"It seems likely this is a Windows-only problem, although I could swear I've seen people on Linux reporting uTP speed issues at least once or twice...but they could've been downloading from Windows computer/s at the time."

...arvidn finally did tests on a Windows OS and:
arvidn/libtorrent#3542 (comment)
"I can reproduce the poor performance on windows over loopback uTorrent uploading to libtorrent."

There are a couple other posts by me on the same thread that may be relevant here, so I'll link to them to save searching:
arvidn/libtorrent#3542 (comment)
Packet loss, which makes performance so bad, is (best I can tell) caused by libtorrent.

arvidn/libtorrent#3542 (comment)
Multi-graph comparisons of different BitTorrent clients showing the intense severity of packet loss that only libtorrent-based BitTorrent clients on Windows seem to have.

@ichorid
Copy link
Contributor

ichorid commented Feb 23, 2021

When uploading from uTorrent to qBitTorrent via uTP over 127.0.0.1 local loopback...
Packet loss (varying between roughly 5-40% of the total upload speed) was always present even when uTorrent's upload speed limit was set to 160 KB/sec.

🤦 🤦‍♂️ 🤦‍♀️

@ichorid
Copy link
Contributor

ichorid commented Feb 23, 2021

@Seeker2 , thanks so much for this update! Please keep us informed about investigating the issue!

@synctext
Copy link
Member Author

@Seeker2 and @egbertbouman
This would be solid stuff for a performance test with each PR and nightly test.

@Seeker2
Copy link

Seeker2 commented Feb 23, 2021

@Seeker2 , thanks so much for this update! Please keep us informed about investigating the issue!

Old news for others to catch up:
#2620 (comment)
#2620 (comment)

@Seeker2
Copy link

Seeker2 commented May 6, 2021

As this post demonstrates:
qbittorrent/qBittorrent#13073 (comment)
Only 1 side of the seed+peer "conversation" needs to be libtorrent on a windows OS to result in poor uTP speeds.

This is especially painful news:
arvidn/libtorrent#3542 (comment)
...because we have almost no hope of a resolution to this problem anytime soon.

@ichorid
Copy link
Contributor

ichorid commented May 6, 2021

Well, according to the qBittorrent post, the thing is at least reproducible. Maybe someone in our team could volunteer to spend around a month of their life fixing this. But that is unlikely to happen before September for we got much more pressing issues in Tribler right now, and basically everyone in the team is getting at least a month of vacation this summer.

This was referenced Oct 21, 2021
@Seeker2
Copy link

Seeker2 commented Nov 6, 2021

More issues found with uTP in libtorrent:
arvidn/libtorrent#3542 (comment)
...and posts following that one.

@Seeker2
Copy link

Seeker2 commented Jul 24, 2022

Another example of uTP packet loss occurring even on non-Windows OSes:
arvidn/libtorrent#3542 (comment)

A partial workaround mentioned for Debian:

Increasing the send socket buffer using send_socket_buffer_size within libtorrent alone didn't seem to be enough to stop the drops from happening, and only resolved for me when tweaked system-wide.

@Seeker2
Copy link

Seeker2 commented Sep 29, 2022

Maybe someone in our team could volunteer to spend around a month of their life fixing this. But that is unlikely to happen before September

It's now nearing the end of September...a year later.

Seeding torrents have become even more difficult for me due to indirect consequences from libtorrent's uTP-related problems arvidn/libtorrent#3542 (comment)

As to the root of the uTP problems... more sinister causes need to be considered arvidn/libtorrent#7107 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Tunnels
  
To do
Development

No branches or pull requests