Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True GigaChannels with 1 billion magnet links #3971

Closed
synctext opened this issue Oct 18, 2018 · 10 comments
Closed

True GigaChannels with 1 billion magnet links #3971

synctext opened this issue Oct 18, 2018 · 10 comments

Comments

@synctext
Copy link
Member

synctext commented Oct 18, 2018

Tribler aims to be a realistic alternative to Youtube. Goal: support an unbounded wealth of user-generated content. Solution: create a Bittorrent swarm of any magnet link collection.

1 channel may contain up to 1 billion magnet links

A Youtube-alternative requires seamless scalability of a server-free video platform; beyond anything scientists have been able to create so far. Fast growth needs to be accommodated, for instance, Netflix has a growth rate 200.000 new users per day.
image
Current Tribler channels do not support envisioned growth rate. For instance, Youtube channels obtains lot of new subscribers each day. Taking some stats of the Internets, we see that:

  • More than 300 channels are hitting the 10,000 subscriber mark every day.
  • More than 40 channels are hitting the 100,000 subscriber mark every day.
  • Around 4 channels are hitting the 1 million subscriber mark every day.

Youtube channels with a gigantic amount of videos:

  1. 1013672 videos
  2. 727931 videos
  3. 624745 videos

The 10 Biggest YouTube Channels Right Now

  1. PewDiePie. Subscribers: 66.1 million.
  2. Dude Perfect. Subscribers: 34.54 million.
  3. HolaSoyGerman. Subscribers: 34.48 million.
  4. whinderssonnunes. Subscribers: 31.4 million.
  5. elrubiusOMG. Subscribers: 30.83 million.
  6. Fernanfloo. Subscribers: 29.34 million.
  7. JuegaGerman. Subscribers: 28.95 million.
  8. VEGETTA777. Subscribers: 23.85 million.

Technical Details

Above statistics are about the number of users subscribed to 1 channel. We will create 1 bittorrent swarm for each channel. If we project above subscribed numbers to Gigachannels we can expect near instantaneous access to a random partial selection of magnet links. If changes are made to the channel, the Gigachannel swarm is updated automatically. Changes to the Gigachannel should be quick and never trigger hash re-checks. This issue does not cover the discovery of Gigachannels or spreading update of Gigachannels. Fast incremental and hierarchical traversal of Gigachannels is also very much left as future work. Keep it a huge dumb list for starters.

1 billion magnet links represents a significant storage size. For example, 1 billion times magnet:?xt=urn:btih:23ABBAA2A7D44A4EAFCBC907DB475376D1422629 equals 61 GByte at minimum.

Requirement: 1 computer generates 1 billion random magnet links, one valid magnet link, and creates a channel. The channel address is made available to another Tribler instance and download is started. We allow unlimited transfer time, but no errors are allowed to occur.

@synctext
Copy link
Member Author

a large effort is ongoing to implement above concept. PR and early Jenkins experimental results with 10k magnet links:
image

@ichorid
Copy link
Contributor

ichorid commented Oct 18, 2018

Billion-links dumb list is a nice thing for testing system scalability, but useless to users (no pun intended). A regular human being is barely able to process a list of one hundred options, save for billions. So, humanity handles big data collections by building hierarchies and networks of knowledge. Wikipedia beats paper encyclopedias flat by providing instant search and hyperlinks.
User experience matters. And nothing kills the mood to watch a movie like a long search followed by "no peers".
We should remember that we design the system for humans, not robots. Humans' limitations add certain requirements but also bring some breathing space with these.

@synctext
Copy link
Member Author

synctext commented Oct 18, 2018

nice thing for testing system scalability, but useless to users

You are correct, this issue only covers the raw storage of magnet blurbs. For real usage we need magnet links with proper naming and cryptographic signatures.

@devos50
Copy link
Contributor

devos50 commented Oct 19, 2018

It's not only about naming, but also about presentation. When the basic functionality is working, we should discuss a way to make it easy for users to locate their favorite content fast and reliable.

@devos50
Copy link
Contributor

devos50 commented Feb 22, 2019

More experimental Jenkins results can be found here (small scale, 20 nodes that create a channel with 200 random torrents). This experiment will be part of our Tribler validation experiment pipeline.

known_channels

@synctext
Copy link
Member Author

That is a solid starting point. Do we have a Jenkins server with space for 1 billion magnet links?

@devos50
Copy link
Contributor

devos50 commented Feb 23, 2019

We should have the required storage for 1 billion magnet links. During the experiment, each instance generates 200 fake torrents and starts sharing them with others. This number can easily be increased, however, I noticed that it takes quite some time to generate these torrents (even a small number) and add them to your channel. To share 1 billion magnet links, I would propose to pre-generate them on one of our servers and not generate them on-demand during the experiment.

@ichorid
Copy link
Contributor

ichorid commented Feb 23, 2019

@devos50 , generating torrents is much faster than sending them over, if one doesn't impose any correctness constraints on them. In my experiments, creating 1K fake torrents typically took less than a few seconds.

@synctext we can safely assume that a single torrent never takes more than 1K space in storage space (DB and .mdblob forms combined). Therefore, to store 1*10^9 torrent entries we'll have to provide no more than 1TB space on the sender, and the same amount of space on the receiver. A couple of 1 TB HDDs should suffice.

@synctext
Copy link
Member Author

synctext commented Jul 2, 2019

@ichorid Just bumping this ticket again. Please make a real 1 billion links script and performance graph, possibly after we are in Release Candidate mode this month.

@drew2a drew2a added this to the Backlog milestone Sep 15, 2021
@ichorid
Copy link
Contributor

ichorid commented Sep 28, 2021

For all practical purposes, we got channels with millions of torrents in the system right now. Though, the architecture of using torrents to send big channels data is inherently flawed due to the "unstable infohash" problem: #4677

Closing this one.

@ichorid ichorid closed this as completed Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
GigaChannels
  
To do
Development

No branches or pull requests

5 participants