Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operational credit mining branch, Gumby scenario and multichain #1842

Closed
synctext opened this issue Jan 5, 2016 · 102 comments
Closed

Operational credit mining branch, Gumby scenario and multichain #1842

synctext opened this issue Jan 5, 2016 · 102 comments

Comments

@synctext
Copy link
Member

synctext commented Jan 5, 2016

Get a working performance analysis environment.
Goal: ability to make pretty graphs locally with both credit mining and multichain (two different branches).
Task: make an Ubuntu channel or other large validated legal collection, with actual "from the wild" downloads.
This environment will be expanded over coming months.

part of #23.
ToDo: read "tragedy of the commons" paper, 1968 problem.

Strategy: Till end of Feb focus on credit mining, then include relaying bandwidth to avoid too much complexity at the start.
Next sprint: co-investor detection?

@ardhipoetra
Copy link
Contributor

Thesis Proposal/Roadmap v.0.3

Big goal :

  • Mining credit anonymously

Benefit :

  • Boost unhealthy swarms, thus increasing the download speed for leechers
  • Collecting "credit" so "helper" can get higher download speed
  • Maximize bandwidth utilization of a user/node

Current state of the art :

  1. [1] is not integrated on Tribler, paper only shows how many credit (bandwidth) gathered
  2. [1] only contains high level abstraction of archive mode
  3. [1] Unknown effect when applying credit mining in respect to download/upload normally as it used all the bandwidth without consideration
  4. [1] no defined source standard (or maybe it's just me). Have to handle some exceptional case
  5. Helping swarm heavily relied on "share mode" in libtorrent [2]
  6. Current study limited to private community [3]
  7. Multichain only consider bandwidth [4]

Some definitions :

  • Mining = Activity to collect credit anonymously
  • Helper = The miners
  • Credit = multichain block ("repackaged" bandwidth)

What is credit :

  • Credit in Tribler is defined by "Multichain" [4]
  • Multichain currently only contains total up/down data (bandwidth used)
  • Multichain stored locally on top of Dispersy database
  • Multichain defined transaction history of a particular node with others

Specific Task (no specific order) :

  • Integrate credit mining into Tribler so we can see its effect on Tribler-Tribler or Tribler-non Tribler
  • Mining must be aware of bandwidth used in current activity (Downloading/Seeding/Streaming), and act accordingly
  • Compare and improve (if possible) current policies to detect swarm in needs. Real life scale experiment
    • Intuitively, VOD/streaming in Tribler needs more healthy swarm than other file type
    • Ignore unhealthy swarm that very few leecher (for the greater goods)
    • Advanced policy, instead of sorting swarm, probably "piece/chunk chooser" mechanism
    • Adaptive parameters : interval(s), max torrent actively mined, chunk size(?)
  • A swarm is being helped by (relatively) limited number of helpers
  • (Abstract) Find a way Incentivize user to activate credit mining
  • (Optional) When target ratio reached, donate to "less-prioritized swarms".
  • (Optional) Automatic mining. Instead of user choose the sources.
  • (Optional) A feature to evaluate miners/swarms/mining effort rate
  • (Optional) Consider leecher's bandwidth when donating a swarm

Problem/Limitation/Doubt :

  1. Persistence credit. Credit obtained from previous session can be used thereafter.
    • Is it necessary? Is it follows the "standard"?
    • If yes : how to store credit. If no : clean the storage, revert state.
    • Outside the scope of credit mining, probably in multichain's scope
  2. Multichain only hold bandwidth.
    • How to incorporate used storage and "effort rate" in peer transactions/communication?
    • partial seeding with complete storage need to incentivize as well
  3. Is it possible to :
    • mark that a piece/chunk is mined
    • check a swarm being helped or not (get helper list from a particular swarm)
    • when seeding, a node prioritize the one who helped him before (more than just t4t)
    • map user ID/IP to credit?

Related work :

  1. Decentralized credit mining in p2p system, Mihai(2015)
  2. Towards a peer-to-peer bandwidth marketplace, Mihai (2014)
  3. Investment Strategies for Credit-Based P2P Communities, Mihai(2013)
  4. MultiChain: A cybercurrency for cooperation, Steffan (2015) - Msc Thesis

@whirm whirm modified the milestones: Sprint 2, Backlog Feb 15, 2016
@ardhipoetra
Copy link
Contributor

@synctext
Copy link
Member Author

Concrete ToDo:

  • Getting the credit mining GUI ready for experimental release
  • User selects channels to boost, algorithms selects the swarms then automatically
  • Minimize the friction for the user
  • Offer a 1-click content selector, minimize the detailed swarm info
  • Credit mining GUI provides a channel overview:
    • Total Votecast records seen
    • For each channel how many votes it got + latest content
    • checkbox to easily enable boosting for that channel
    • move the current detailed credit mining gui details to a button.

Next step: torrent_checker.py integration.

@synctext
Copy link
Member Author

Initial GUI on Tribler home screen:
20160224_110104

Credit mining screen: 1000+ discovered channels, overall credit mining dashboard, channel status, channel content status.
20160224_112541

Thesis storyline:

  • intro: screenshots of Mihai proof-of-principle
  • problem description: Solve hit-and-run behavior (avoid tragedy of the commons, 1968 problem). Effortless donation of resources. Task: scheduling Bandwidth, storage, and relay. How do we balance between different donations for optimal network performance? Beyond Mihai: production-level code!
  • design: Discuss Mihai inner workings. Discuss Libtorrent share mode. Schedule and balance: relay bandwidth, own downloads/streams, speculative downloads, free storage space, and credit mining income.
  • implementation and experiments: Discuss 3-4 prototypes and experiments. Credit mining inputs: age of swarm, random scheduling, Similarity Function, and co-investor detection. First prototype: first GUI & first swarm boosting, use most popular channels, based on VoteCast, no relay yet. Second: use multi-chain to display earned credits in real-time, use persistent storage of multi-chain credit stats. Third: mature torrent checking code and fancy scheduling. Final: balance credit mining with storage & relay with a 1TByte storage server! Lot of performance evaluation graphs.

Next sprint: accurate investment information, torrent_checker.py integration, discovered swarms, downloaded torrents, DHT lookup failure, failed to download .torrent, used bandwidth for checking, swarms connected to, dead swarms, seeder/leecher info determined.

@synctext
Copy link
Member Author

synctext commented Mar 9, 2016

Next steps: understand swarm seed/leecher checking. Running code to bypass central tracker. Also problem to TCP check trackers, as tunnels only support UDP.

Goal: create a good scrape mechanism using DHT and UDP in existing messy Tribler code.

Some very old code is still using manual checking in our own Python code:
https://github.com/Tribler/tribler/blob/v6.0/Tribler/Core/DecentralizedTracking/mainlineDHTChecker.py

Now we use Libtorrent, which has it's own DHT and another DHT library: pymdht.

@whirm
Copy link
Contributor

whirm commented Mar 9, 2016

The more recent versions of libtorrent have exposed more stuff to the python bindings. It would be nice if someone checked if the needed DHT stuff is available now. That would allow us to drop pymdht entirely.

@synctext
Copy link
Member Author

synctext commented Apr 19, 2016

Completed:

  • First Gumby scenario file with credit mining of hard coded channels

Tasks for coming 3 weeks:

  • fix GUI wx3
  • runs flawless, stable for days/weeks
  • passes unit tests, code coverage
  • select channel and credit mine!
  • tracker checking UDP....
  • code review by @devos50

Thesis material credit

  • Credit mining, cost of evaluating a swarm is essential
  • similar to prospecting, what is a good investment ?
  • How many bytes does it cost, per swarm?
  • cost of DHT, .torrent collecting, PEX, and peer handshake to determine seeder/leecher.
  • example of low-level calculations like this: seeding one million swarms

Leftover work:

  • pyDHT for introduction points, LibtorrentDHT for the rest. ToDo: expose LibtorrentDHT for HiddenTunnelCommunity and removed pyDHT.

@synctext
Copy link
Member Author

synctext commented May 9, 2016

additional thesis material:
cost of swarm checking in bytes
yield of swarm investments
coinvestor factor
relay cost
archiving cost eith disk space limits

@synctext
Copy link
Member Author

synctext commented May 9, 2016

working code, see issues above. Create a legal channel with 500+ legal swarms. Repeat the Mihai experiments. Show in Gumby on Jenkins with nice R plots.

@synctext
Copy link
Member Author

synctext commented May 23, 2016

@synctext
Copy link
Member Author

synctext commented Aug 31, 2016

Focus for 2 weeks: finish Problem Description and Related work (in 2 chapters or 1):

  • Flashcrowd scenario, press discussing potential replacement for Youtube services without servers. 10.000 new users/day for 1 week. Each donating 250 GByte. ResearchQuestion: make optimal use of donated resource.
  • Left-over bandwidth allocation. ResearchQuestion 2: what mechanism can ensure that CreditMining never disturbs the users primary activity of searching, downloading, and streaming.
  • How to seed 250GByte effectively, and anonymously !
  • Channels are the mechanism to bundle torrents, what channel to autoboost (e.g. investment decision)?
  • related work: Torcoin, lira, etc
  • related work: iterative Prisoners Dilemma, emergence of cooperation, collective action problem

ToDo:

  • create public repo for msc thesis
  • present latest credit mining results yield graph here and in Reddit.
  • describe 'Bittorrent Gold' prospecting algorithm: download the first 4 pieces of a swarm anonymously. Establish reliably the swarm size, with minimal automagic Libtorrent disk allocation.
  • understand credit mining implications for exit nodes and minimize their workload by preferring end-to-end encrypted anonymous seeds

@ardhipoetra
Copy link
Contributor

@synctext
Copy link
Member Author

synctext commented Sep 21, 2016

Create:

    1. introduction (file sharing, p2p, upload problem,Bittorrent, tit-for-tat, Tribler, allchannel&search community table, multichain and upload credits)
    1. Problem Description (credit mining, good investment, related work: Lira&Torcoin&Bitcoin..,P2P economics, sharing ratio enforcement, etc.)
    1. System design (picture, explain N piece prospecting, etc.)

Open PR #2398, #2351

October steps: experiments with prospecting: select channel, DHT responsiveness, time to download download first 4 pieces, discovered swarm size (PEX,seed/leech stats).
design & add features to core API: ? prospect_swarm(SHA1) ?
Plot for 10000 swarms: DHT success, 4 piece time, swarm size stats.

@ardhipoetra
Copy link
Contributor

Problem description revision :

2.5 : how to prospect a good investment
how to detect investment, what is the properties of that

2.6 : when to delete a downloaded swarm and replace it with a new investment
if projected yield is more than enough - cost of download

@synctext
Copy link
Member Author

synctext commented Oct 12, 2016

Thesis feedback :

  • Explain clearly what help means, include picture
  • Explain Multichain in details in chapter one, include picture
  • 1.4.2 Supply and Demand, nobody in thesis committee will understand this.
  • quote [28] : The results for TVTorrents show that after about 2 hours, virtually all of the data comes from seeders. Apparently, tit-for-tat is almost irrelevant in such communities.
  • The prototype of the framework was conducted by Capota et al., mainly to run credit mining system without any restriction or coordination with any client. Move to later section, rephrase "our work is based on an earlier emulation and simulation work by Capota et.al. from 2010 till 2013".
  • 2.1 Cooperation and performance in BitTorrent, duplicate contents
  • prospecting: include details, magnet --> DHT --> (BEP9](http:https://www.bittorrent.org/beps/bep_0009.html) -2nd DHT lookup? - PEX - peer connections - determine swarm size.
  • first explain prospecting, then PEX,DHT.

ToDo: pull requests !
Prospecting experiment

@ardhipoetra
Copy link
Contributor

It has been processed in ardhipoetra/msc-thesis-creditmining@04ce285.

For prospecting details (how it works), I want to put that on 'design' chapter.

@ardhipoetra
Copy link
Contributor

ardhipoetra commented Nov 2, 2016

Downloading 4 pieces of 10k+ torrents in 12 hours with 500 torrent connection in one time, timeout is 1 hour, resulting in below :

dist

histo

@ardhipoetra
Copy link
Contributor

ardhipoetra commented Nov 3, 2016

@synctext
Copy link
Member Author

synctext commented Nov 3, 2016

Idea for final experiment (after discussing prospecting) :

  • Channel with lot of torrents
  • automatically detect underseeded torrents
  • credit mining manager boosts unavaiable swarms
  • All seeding capacity is evenly distributed
  • Outcome: all swarms have equal number of seeders, poorly seeded swarms no longer exist.

Next sprint:

  • prospecting on Gumby
  • thesis design chapter .tex

With progress: create thesis committee

@synctext
Copy link
Member Author

synctext commented Nov 8, 2016

idea for 1st experiment of thesis.

This first experiment is designed to evaluate the basic operation, correctness, responsiveness, and efficiency of our work. It is specifically crafted to be simple and easy to understand plus debug.

1 channel, add torrent, add 1 credit miner. Add second torrent, measure how fast it is mined..

@egbertbouman
Copy link
Member

@EinNarr Just talked to Johan about your experiments, and you're very much invited to compare multiple policies in your thesis. The torrents in your experiment can just be synthetic. No need to use wild swarms. This avoids your problems with many overseeded swarms.

@EinNarr
Copy link
Contributor

EinNarr commented Jul 9, 2018

thesis.pdf

@synctext
Copy link
Member Author

synctext commented Jul 9, 2018

Explore different parameter settings & experiments:

  • invest for 2 min or 60 min in a swarm?
  • investigate 25 or 250 swarms at once?
  • Smart Download Mode:
    • Restrict to 25 MByte or 10-ish pieces
    • Switch from download mode to upload mode in Libtorrent
    • 250 swarms of 25 MByte take
  • Fake download location that Arvid made (for us? :-)
  • Bittorrent Simulation Framework
    • you have no data on-disk
    • you can still seed
    • send arbitrary data
    • Download arbitrary data, turn off hash-check
    • All original Libttorrent engine, with some patching.
  • compare two policies with minor differences or which are very different
  • document in thesis, HAPPY; DONE!

@devos50
Copy link
Contributor

devos50 commented Jul 9, 2018

@EinNarr
Copy link
Contributor

EinNarr commented Jul 10, 2018

Roadmap to graduation still remains unclear.
Starting from refining the test frame work which had already been used in the 10 torrents 1 channel experiment.

@EinNarr
Copy link
Contributor

EinNarr commented Jul 24, 2018

Experiments done outside Tribler till now.

  1. Experiment on 100 most popular torrents
    figure_4
    figure_7
    By "investigated", it means the torrents who finished the initial 25mb investment.
  2. Experiment on 30 most recent torrents.
    figure_4
    figure_7
  3. Experiment involving "fair-comparison" (compare (upload-download)/time instead of pure upload-download)
    figure_4
    figure_7
  4. Keep introducing new torrents to the session, set "promotion" to 3 levels (25m, 250m, 1G) instead of only one in previous policies.
    figure_4
    figure_7

@EinNarr
Copy link
Contributor

EinNarr commented Jul 24, 2018

Some thoughts about the 4th policy.
This is the share ratio vs. number of torrents graph
figure_10-7
figure_10-8
The X-axis is upload/download ratio, Y-axis is number of torrents. The curve is the number of torrents have same or better ratio than its X-coordinate. Bar chart is the number of torrent have upload/download ratio no worse than its X-coordinate, but worse than the next ratio on the X-axis. The blue bar is for all the torrent, and the orange is for the "promoted(limit to 250MB) torrents. There should have been a three color on bar chart for “super promoted(limit to 1GB)", but as far as I observed, there is no torrent with more than 120MB download.

It could be that no torrent is ever super-promoted, or only small torrents less than 120MB are super-promoted.
Details will be determined after the newest data is revealed tomorrow.

@synctext
Copy link
Member Author

Very impressive thesis material!

@synctext
Copy link
Member Author

Please focus on your thesis writing and storyline now. Stop working on graphs and data processing.
Proposed storyline.

We present in this chapter the experiments we conducted around credit mining investments with the following scientific topics:

    1. Discovery of opportunities
    1. Do a single investment
    1. Delete bad investments
    1. Increase high yield investments
    1. Retire exhausted investments

@EinNarr
Copy link
Contributor

EinNarr commented Aug 3, 2018

Another latest version of PDF and an interesting percentage change of total download of each category.
credit-mining.pdf
figure_17

@synctext
Copy link
Member Author

synctext commented Sep 5, 2018

  • The Big Picture is not understood by thesis committee
    • universal cashing (boosting/acceleration) mechanism
    • generic system: just give it 1000 swarms, it will invest in the best ones
    • does not need any info from swarms, just 20Byte ID, takes magically care of the rest
    • NOT JUST A SIMULATION (this was deeply misunderstood)
  • Policy fundamentals:
    • discovery of opportunity
    • initial investment
    • increase investment
    • dead investment
    • exhausted investment (not covered in thesis, but we know it exists)
  • Architecture picture
  • Related work, like: http:https://members.unine.ch/pascal.felber/publications/PAM-04.pdf
  • My contributions
  • Swarm Definition
  • Reproducible science
    • more info
    • date of experiment
    • Repeat experiment, obviously different outcome
  • What is the desired outcome of these experiments?
  • document engineering effort
    • Lines of Code per class/source file
    • three implementations, 2 covered in thesis: 1st prototype, 2nd prototype.
  • add picture of the conducted greedy experiment + discuss.

@synctext
Copy link
Member Author

synctext commented Sep 5, 2018

Plus this remark: "the link to his code added to the thesis (in the spirit of reproducibility)."

@EinNarr
Copy link
Contributor

EinNarr commented Sep 12, 2018

cmthesis.pdf

@synctext
Copy link
Member Author

synctext commented Sep 12, 2018

Thesis is still not clear

Thesis is not in good shape yet sadly. Detailed remarks:

  • mentions engineering details everywhere, instead of presenting the overview clearly, like time of experiment: 8:31pm on 17 July.
  • Explain building blocks better of policies
  • Section "Definition declaration", rename to "terms and definitions"
  • Pic? https://goo.gl/images/vpFH71
  • Fig 1.2 : explain all pieces or remove picture
  • Fig 1.3 move to Section 1.4 Token Economy ?
  • goal of token economy in Tribler is to create a self-sustaining Youtube-like ecosystem which is completely server-free. Or more generic, improve upon tot-for-tat collaborative mechanisms.
  • "With 5199 lines of code inserted and 1961 lines deleted(including duplicates),
    we design and implement a system called Credit Mining, relying on existing
    features of Tribler", more like, 'we designed, implemented, deployed, and evaluated a real-world credit mining system'.
  • 3.2.1 detect dead swarm
  • "Intel Core i7-7820HK" do not mention such details of first page, explain the general character of your experiment.
  • "3.3 Graphical user interface", KODI, the work in this thesis aims to be used in a television-like context. We ported our software to the market-leading low-cost embedded TV platform called Kodi.
  • first sentence of the final experimental chapter contain a strange intro line, stating negative results: "the policy we implements does not provide a positive enough yield."

@EinNarr
Copy link
Contributor

EinNarr commented Sep 21, 2018

untitled (6).pdf

@synctext
Copy link
Member Author

synctext commented Sep 21, 2018

  • big picture is unclear, it's not about adding more details.
  • Please replace all text on page 23, Chapter 4 intro with better text.
  • first lines of experimental section need to explain what you did and why.
  • "In the previous chapter we explained how we aim to solve the freeriding problem with a token in Bittorrent-like systems. In this chapter we deploy our algorithm, using real-world Bittorrent swarms to validate our assumption and measure performance. We first test our proposed algorithm in a controlled environment to validate our implemented mechanism."
  • Re-write first lines of each chapter, after introduction.
  • 'Ever since the popularization of P2P file-sharing systems, people have been fighting an endless war against free-riders'
  • Avoid jargon in each first sentences. Like Chapter 2: Collaborative systems like P2P file sharing critically depend on the voluntary donations of bandwidth by their users. However, many people do not contribute or upload in such system, the freeriders. This problem is central in this thesis. For years scientists have tried to express cooperative behavior in such communities with a token, coin or karma points. Uploading to other peers will be rewarded with bandwidth tokens and downloading will cost these tokens. We focus on the problem of automatically earning credits by contributing bandwidth to a community. [new section with more details].

@przemyslaw-pawelczak
Copy link

Summary of suggested changes after a discussion between Przemyslaw Pawelczak and Bohao Zhang on 28 September 2018, 10:30 TU Delft

Content corrections:

  • Please add into "Related work" (Section 2.3) information about other fields (Banking? Car leasing? Army?) with a similar problem of free-riding and how it is solved in these fields
  • I know that Johan suggested to add an extra paragraph in Section 2.3. on the discussion on policy fundamentals: discovery of opportunity, initial investment, increase investment, dead investment, exhausted investment. A short paragraph on each of these points would be great
  • Please expand architecture picture in Figure 3.1 as it is too simplistic as of now. This figure should provide information about (a) software components [which function does what in all the blocks you list in Figure 3.1], (b) where each of these blocks run [on a specific client?, on many clients? On a swarm? How many of them are they? etc.], (c) what information exactly is exchanged between these blocks [TCP packets/UDP packets?, packets containing what?, on what sockets?, etc.] (d) how each of the boxes is split into sub-components (e.g. TrustChain is a huge component and needs to be explained in a more detail). The more details - the better!
  • In Section 4.1 provide information about the underlying peer-to-peer network (how many clients, how geographically distributed, how many were seeders, how many free riders originally, network connectivity, spatial distribution of a network [how often clients were in-active/offline], etc.) where you tested your CreditMining system

Text structure corrections:

  • Divide "Future Work" (Section 6.2) in subsections such that one subsection is one topic of future work
  • Make sub-sections in Experimental parts of the thesis: experiment goal, experiment setup, experiment results
  • Move implementation information you provide in Page 10 to a new subsection which should be placed after Subsection 3.3

Editorial corrections:

  • Replace Figure 5.9 with bar plot and make it smaller
  • Move all figures to the top of the page
  • Make Figures smaller if they occupy a lot of space per page (e.g. Figure 3.4)
  • Make capitalisation of all titles/subtitles/subsubtitles/... uniform
  • Add full stops at the end of the sentence in each figure caption
  • Correct Page 8 by removing lots of white space between figure and text

@synctext
Copy link
Member Author

Related work on Delft University credit mining:

@xoriole
Copy link
Contributor

xoriole commented Feb 1, 2019

A live screenshot of token mining running multi-level policy
credit_mining

@qstokkink
Copy link
Contributor

After years of effort, we finally have an operational credit mining strategy. Now it actually generates tokens as well 😃

@synctext
Copy link
Member Author

#4999 The dashboard aims to contain the performance of all Tribler components.
Test case: fresh server install plus VPN with 24h of token mining. Refresh daily.
Is the above multi-level policy included in Tribler now? (PR # for documentation?) @xoriole

@xoriole
Copy link
Contributor

xoriole commented Dec 17, 2019

#4999 is certainly an interesting dashboard to have. We can incrementally add (auto-updating) results of tests in such a dashboard.

@synctext Multi-level policy is already integrated into Tribler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants