Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails building circuits with large number of anon seeds [still issue with 6.6.0-exp1] #1683

Closed
colin1497 opened this issue Oct 14, 2015 · 30 comments

Comments

@colin1497
Copy link

Seeding >60 torrents anon, tribler sometimes fails to build circuits and seed. Lots of info in #1605, see this comment specifically:

#1605 (comment)

Splitting this issue out for tracking purposes, may be related to #1682

@colin1497 colin1497 changed the title Tribler fails building circuits with large number of anon seeds (RC5) Tribler fails building circuits with large number of anon seeds [RC5] Oct 14, 2015
@colin1497 colin1497 changed the title Tribler fails building circuits with large number of anon seeds [RC5] Fails building circuits with large number of anon seeds [RC5] Oct 14, 2015
@whirm whirm added this to the Backlog milestone Oct 16, 2015
@whirm whirm modified the milestones: V6.5.x, Backlog Oct 28, 2015
@whirm whirm modified the milestones: V6.5.x, V6.6 WX3 Feb 17, 2016
@whirm
Copy link
Contributor

whirm commented Mar 18, 2016

@synctext I guess this should be assigned to whichever student is in charge of improving the tunnel community?

@synctext
Copy link
Member

OK, yes. Another nice student job.

So the tunnel community has scaling problems. Good to hear from our users about this. With a nice performance graph, it's even performance analysis & re-factoring.

@synctext
Copy link
Member

@Pathemeous #21 Seems the 1TByte seeding goal is difficult. Seeding anonymously gives already errors at >60 torrents.

@Pathemeous
Copy link
Contributor

Yes, this should be the first target to overcome (as expected). It seems that this is related to the big GUI refactor? Without a clean API such high-performance goals are bound to result in errors like these.

@whirm
Copy link
Contributor

whirm commented Mar 18, 2016

It's not related to it, but in principle we where planning to have it fixed for the wx3 release.

Depending on how long the tunnel community refactor will take, it could be that the improved tunnel community is not ready until after the new gui is ready, so maybe it's not worth worrying about breaking the gui parts of it for now.

@whirm
Copy link
Contributor

whirm commented Mar 18, 2016

If the WX3 part of this milestone is ready before the rest, we can still release that so we get in Debian/Ubuntu ASAP and split the rest for a future milestone/release.

@lfdversluis
Copy link
Contributor

Starting to target this one as it's the last issue assigned to me for 6.6. What do I have to work with? @colin1497 I see now that it's been open a while, what do you remember; do you have any stacktrace or log related to this issue?

My first guess is that downloading or seeding 60 anon torrents means creating >= 60 hidden tunnels which is quite heavy on the cpu side. Having (blocking) python calls for 60 tunnels probably results in e.g. diffie hellman handshakes or intro-points timing out, which may be what is going on here. @whirm @synctext do you think this is a probable cause? Looking at the code there are many possible timeouts.

@colin1497
Copy link
Author

I'm afraid it's been a while. Looking back at #1605 I originally had 91 torrents, so it was well over 60. I haven't seen this is a long time. Besides build changes, I've also tripled my data rates with my ISP since I originally created the report. If your speculation is right, then you would hit it at some point. Maybe the issue is just that it shouldn't start all the tunnels in parallel? Maybe it should queue them up and start a max of 10 in parallel or something? Just thinking out loud.

@whirm
Copy link
Contributor

whirm commented Apr 7, 2016

Even if some requests time out it should still keep building circuits until it hits the circuit target.

@lfdversluis
Copy link
Contributor

@whirm sure, but if due to a large amount of circuits being built not a single one can actually be constructed, rescheduling them all concurrently will mean that all the newly scheduled circuits will timeout as well. Assuming that this is the issue of course.

@whirm
Copy link
Contributor

whirm commented Apr 8, 2016

@lfdversluis let's stop guessing and try to reproduce it instead. Once you've got a scenario where this happens.

If you don't have a shitty Internet connection, use wondershaper to fake it :)

If you want to limit the amount of cores Tribler can use (this shouldn't make a huge difference) you can use taskset.

@colin1497
Copy link
Author

colin1497 commented Jun 11, 2016

FYI, just updated to 6.6, 7cd6ed7 and it fails to build circuits. 103 seeded torrents.

Just looking at the Windows resource monitor it doesn't appear to be CPU bound. Resource monitor shows lots of disc activity on the mechanical drive where torrents are stored (60-100MB/s). Network activity rate is relatively low, well under 1Mbps.

On exit, I get a log file each time. I have diffed a couple of the log files and they are basically the same:

Tribler.exe.log.txt

Edit:
After deleting my old tribler.conf file, it successfully built circuits and is now checking every one of the 103 torrents.

Edit2:
Comparing the tribler.conf files, aside from the old one having some old options like t4t*, the big difference appears to be the "user_download_choice =" option with all the torrent hashes with "restartseed". I'm going to let it finish all these checks, shut down, and see what happens.

@colin1497 colin1497 changed the title Fails building circuits with large number of anon seeds [RC5] Fails building circuits with large number of anon seeds [6.6 pre] Jun 11, 2016
@synctext
Copy link
Member

@colin1496
We still did not have a look at this, sorry.
The credit rewards for seeding + credit mining have been our prime focus since Feb..
Once that is done, the anon tunnels will get full attention.

@colin1497
Copy link
Author

No problem, just trying to give as much info as possible. After checks were completed I restarted and again no joy with almost an identical log file.

@lfdversluis
Copy link
Contributor

@colin1497 Thank you that is very valuable info. It seems that the IO is too heavy and probably completely blocking the twisted thread, most likely resulting in circuits timing out due to handshake failures and what not. I am currently in the process of making the IO non-blocking by pushing it out of the twisted thread in Tribler/dispersy#481 but this migrating is still underway.
After dispersy, Tribler is next including the tunnels.

@lfdversluis
Copy link
Contributor

lfdversluis commented Jun 11, 2016

Hmm looking at the log file I see ImportError: No module named csv which should be shipped with Tribler.

File "twisted\internet\base.pyo", line 825, in runUntilCurrent

  File "Tribler\Core\APIImplementation\LaunchManyCore.pyo", line 486, in session_getstate_usercallback_target

  File "Tribler\Main\tribler_main.pyo", line 498, in sesscb_states_callback

  File "Tribler\Main\Dialogs\systray.pyo", line 40, in updateTooltip

exceptions.AttributeError: 'ABCTaskBarIcon' object has no attribute 'icon'

is wx related, we are moving to QT soon so that should be fixed soon.

File "twisted\internet\defer.pyo", line 150, in maybeDeferred

  File "Tribler\Core\Modules\versioncheck_manager.pyo", line 54, in check_new_version

  File "twisted\web\client.pyo", line 1594, in request

  File "twisted\web\client.pyo", line 1578, in _getEndpoint

  File "twisted\web\client.pyo", line 1454, in endpointForURI

  File "twisted\web\client.pyo", line 818, in raiseNotImplemented

exceptions.NotImplementedError: SSL support unavailable

means our version manager is broken? @devos50 what do you make of this?

@colin1497
Copy link
Author

colin1497 commented Jun 13, 2016

I am relatively certain that I didn't get the log entries in the session where I deleted tribler.conf and it rechecked every file. I think that it's only happening when it never is able to build the circuits.

Edit: No - seems a clean install just starting tribler fdfd8db gives this log:

Tribler.exe.log.txt

@whirm
Copy link
Contributor

whirm commented Jun 14, 2016

@colin1497 you need to install python-openssl

@whirm
Copy link
Contributor

whirm commented Jun 14, 2016

@colin1497 if you are running from git, you should install all the dependencies listed on debian/control

@colin1497
Copy link
Author

Downloading Windows installer builds from Jenkins. I shouldn't have to separately install dependencies in that scenario, should I?

@synctext
Copy link
Member

ah, the unchecked latest Windows builds. Fresh from Jenkins.Tribler.org then?

These are not often checked if they function OK.
It would be good to check if this bleeding edge code, freshly installed can seed just one swarm correctly in Anon mode.

@lfdversluis
Copy link
Contributor

lfdversluis commented Jun 17, 2016

@colin1497 The devel branch is almost exclusively used by developers that are adding additional dependencies (e.g. I am adding several at the moment). So often we add dependencies on our machines before we add them to the builders to check everything is working. The builders then ship these with the installers :)

As @synctext said, there are not regular checks on devel. Our next branch is far more stable, but we do not have any guarantees on this either. The only guarantee we do strive to deliver is that all dependencies are shipped with our installers (naturally). But if something is not working, do let us know so we can add it to our todo list.

@colin1497
Copy link
Author

Apologies guys, I had been pulling next branch builds previously, and had an issue in 6.5.2 and went to jenkins to grab latest build to see if same issue still existed. Geez, I can see that I clearly ended up grabbing devel branch versions. /facepalm

@whirm
Copy link
Contributor

whirm commented Jun 17, 2016

@colin1497 heh, no worries, at least we know it needs to be fixed now :)

@lfdversluis maybe this is due to the MSVC rebuild you did? Maybe you forgot something onthe python-openssl dll chain.

@colin1497 colin1497 changed the title Fails building circuits with large number of anon seeds [6.6 pre] Fails building circuits with large number of anon seeds [still issue with 6.6.0-exp1] Jul 30, 2016
@colin1497
Copy link
Author

colin1497 commented Jul 30, 2016

Quick update since there was concern about CPU performance:

I tried a few things. I watched CPU usage and it didn't seem that high, not even enough to force the CPU to peg to its max frequency. I set up an idle priority, 100% usage application and pegged it to one core to force the CPU frequency high. I set Tribler to "realtime" priority level. No change in behavior.

I can get 20 connected peers, but can't build circuits for my seeds.

Looking at network usage, it's really not that high -- never goes over 1Mbps. I have Gb infrastructure and 50Mbps connection to the internet.

Obviously that's all macro level.

@synctext
Copy link
Member

@colin1497 You discovered a problem in the tunnel community. The team made a good performance measurement test. Even with light load the tunnel may take 3 minutes to build.

btw Gb infrastructure, nice!

@colin1497
Copy link
Author

Good to hear I found a legit issue.

WRT infrastructure, we completely renovated a house last year and it's relatively ridiculous what all I did....

@devos50 devos50 modified the milestones: Backlog, V7.0 Nov 24, 2016
@colin1497
Copy link
Author

Just wanted to say that that this remains a problem in the 7.0.2 release. I hadn't had an issue with it because:

  1. I hadn't been in Tribler that much, and
  2. At some point I lost my database and started clean with a lot fewer torrents, but

I'm up to the point where at startup basically everything just spins its wheels saying it's building circuits but none ever get going.

@qstokkink
Copy link
Contributor

I'm assuming this to be fixed, but I'll add it to the 7.2 milestone for verification.

@devos50
Copy link
Contributor

devos50 commented Nov 20, 2018

I'm pretty sure this issue has been fixed. Closing the issue. Please let me know if there are any other problems related to circuit building.

@devos50 devos50 closed this as completed Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants