Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thesis: Using distributed blockchains to cultivate trust in a peer group #2533

Closed
pimveldhuisen opened this issue Sep 1, 2016 · 21 comments
Closed
Assignees
Milestone

Comments

@pimveldhuisen
Copy link
Member

pimveldhuisen commented Sep 1, 2016

Thesis: Using distributed blockchains to cultivate trust in a peer group

  • Design a system that can account for user interactions that is:
    • Accurate
    • Tamper-proof
    • Scalable
  • Implement this system
  • Test and evaluate this system
  • Use this system to select peers that can be trusted, and interact with them further to create self reinforcing trust.
@synctext
Copy link
Member

synctext commented Sep 1, 2016

"Cultivating online trust in complex self-organising systems" Nice title/storyline.

Background story: we tried multichain, needs churn resilience, Tribler future-proof architecture means creating a generic "trust cultivation" module. Essential requirement is that this can be re-used in other systems, proven to work in Tribler.

Self-reinforcing trust is a more cold and rational term. Cultivating is more associated with organic and human touch. Interesting.

@synctext
Copy link
Member

synctext commented Sep 15, 2016

Design: why create records every 10min, not every 10MByte ?

Experiments brainstorm:

  • Load balancing experiment (live walk peer overload, already a Gumby graph)
  • Record requesting methods experiments
  • scalability and long-term viability usage (long-term == 10 years)
  • SCIENCE: graph sampling, informativeness and accuracy of sharing ratio (Dedault Dispersy wlak, live walk, etc.). Scenario: join existing 100s Dispersy peers with filled database, +1 new user.

October goal: generate scenario, based on real behavior from our crawl. Either: give existing records or do 1 download/second for 3 hours == 10k downloads of Multichain records.

Perhaps with @qstokkink close the self-reinforcing loop and select non-freeriding nodes for tunnels.

@synctext
Copy link
Member

Public ongoing thesis text : https://www.sharelatex.com/project/573db2896f2d756841e3def8

@synctext
Copy link
Member

synctext commented Oct 5, 2016

  • 3-page intro is too compressed
  • avoid layer and architecture when talking about components
  • Relate to the Hoffman model (http:https://dl.acm.org/citation.cfm?id=1592452)
  • term to use: tamper-proof or attack-resilience ?
  • emergent problem or just requirement of a solution?

@synctext
Copy link
Member

synctext commented Oct 20, 2016

ToDo for thesis roadmap:

@synctext
Copy link
Member

synctext commented Nov 3, 2016

Experiments:

  • walker: (un)directed, (non)live and most-reputed-peer first
  • load balancing, discovery, etc.
  • Convergence of real sharing ratio in a scenario and discovered sharing ratio (e.g. effective crawl speed)

Start to define thesis committee

@synctext
Copy link
Member

synctext commented Nov 25, 2016

Discovery experiment (Gumby) scenario:

  • join existing 100-1000 Dispersy peers with filled database, +1 new user.
  • Existing peers all have filled databases and converged after running for 1 hour
  • Existing peers know each other "trust vectors", simply multi-chain record are distributed
  • Focus of all numerical results is on the +1 peer event, after 1 hour.
  • For this new peer: each walk is recorded and discovered Multichain records are logged
  • Key outcomes:
    • X-axis the time (steps), Y-axis is the number of record received. (how many blocks crawled per traversed edge?)
    • X-axis the time (steps), Y-axis is the number of non-duplicate record received.
    • X-axis the time (steps), Total uploaded+downloaded MByte discovered (different alpha: different reach).
    • Convergence in time: freeriders and altruists?.
    • Load balancing : CDF probability for number of incoming connections.

@synctext
Copy link
Member

synctext commented Dec 6, 2016

Trace-driven simulator, inside Jenkins: https://jenkins.tribler.org/job/Tribler/job/Experiments/job/multichain/job/walker/ws/
bug: the exploration is not monotone increasing.

Conclude: 1 chapter on simulator.
+Experimental section met above Gumby experiments, without tunnel community.

@pimveldhuisen
Copy link
Member Author

bug: the exploration is not monotone increasing.

To clarify: the exploration is actually monotonically increasing, the bug is in the plot script, which renders this in a weird way. Should still be fixed ofcourse.

@synctext
Copy link
Member

synctext commented Dec 16, 2016

Current 31-page thesis

Planning: Dec - finish simulator plot. then document in thesis.

Chapter 2: please add the onion routing context as key use-case. There the 1:13 contribution ratio for default Tor-like setup is an unsolved problem. We aim to solve it for the somewhat easier 1:5 ratio scenario.
Chapter 3: include the 10min Tor circuit lifetime specification, that motivates your design decision.

"Unconditional Random walk vs reputation-directed walk". Danger: how is this work different? Repeating the existing Dimitra experiments?

"3.2.2 Statefull vs stateless walk" incorrect terminology. Either walk to Dispersy neighbor in overlay or to a Multichain neighbor in overlay+shared datastructure. Dispersy chains the contacting of neighbors into a random walk, with probability of .5% for a teleport to a trusted bootstrap server.

Revisiting the non-backwards compatible design with deep walker integration. Unified primitive for:

  • peer discovery
  • NAT puncturing
  • Trust record collection

Thus together forming trusted peer discovery suitable for NAT constrained Internet connections.

multichain_walker_design__2016

Each live edge request simply includes the records the sender already knows about. The notation can be a simple (start block number, last block number). The first live-reponse message always returns the last record, with the most recent transaction. This ensures that each subsequent re-visit we can reply with either fresh records, historical records, or an empty record.

@synctext
Copy link
Member

synctext commented Jan 9, 2017

Recent changes: design chapter
msc_thesis (7).pdf

"3.2 Record discovery"
Key problem to solve is a single peer spamming the network at 100Mbps (attack cost: €5.99/month).
Random gathering makes it vunerable to spam. Pre-trusted nodes are also not a solution. So least-worse solution is to use multichain records.

Simulator is focussed on block discovery; without churn, no block creation, and infinite local storage.

Please make the load-balancing plot more dramatic: some nodes are going to crash. For instance, X-axes has all 700+ nodes sorted by their message-load, Y-axes shows their requests handled. Each node in the system is depicted as a datapoint, with a dramatic curve for the first dozen nodes on the left-hand side (no log-log scale needed).

Implementing "live edge walker" in Gumby estimated to be roughly 1 week. {Gumby darknet mode broken, possibly use clearnet; simple seeding records}

@synctext
Copy link
Member

Good. Finish "Live edges" work in Dispersy, repeat nice load balancing graphs.
Measure discovery and dissemination of records. See experiments above (+1 peer, etc.)

@synctext
Copy link
Member

synctext commented Feb 16, 2017

msc_thesis___Pim_Veldhuisen (1).pdf

  • base problem description on theory ( reciprocity, repeated interaction, 5-rules for evolution of cooperation, etc)
  • Design of Record Creation functionality
  • 3.2 Asynchronicity ==> concurrency
  • 3.3 Half-block Protocol ?
  • missing Chapter 4 intro
  • a shitload of blocks
  • Page 25, online_nodes : connected_neighbors
  • [github to my simulator code]
  • Fig 5.1 axes units + readable
  • experiments are not sufficiently described to be reproduce by a reader

Finish most thesis chapters or finish all experimental results?

ToDo johan reminder: committee form

@synctext
Copy link
Member

synctext commented Mar 6, 2017

@synctext
Copy link
Member

synctext commented Mar 16, 2017

ToDo johan reminder: committee form

Theory + https://www.google.nl/search?q=the+shadow+of+the+future

Branching attack, mention double spending.

4.2 Experiments; Would call it emulation, not simulation.
4.3.2 Alpha Release of Multichain
Start chapter 5 with What and Why. Role of Alpha parameter.
Knowing who has cooperated in the past is a good building block for durable future cooperation. In this chapter we experiment with various algorithm to efficiently discover who cooperated in the past. Issues such as load balancing and discovery speed are key.
5 random walk versus focused walk parameter Alpha.

why continuous discovery from numerous sources ? attack-resilience...

5.2.1 Teleport probabilities. Still difficult to reproduce. underspecified. how many nodes with this 10k blocks? step time? how many connected neighbors? all their blocks when visiting somebody

Figure 5.3: The block collection process using different teleport probabilities.
Explain that this is due to Dispersy random neighbors.

@synctext
Copy link
Member

synctext commented Apr 13, 2017

Page 3, try to say in first line why Tribler is relevent here.
2.0 intro line: for many decades we failed to solve the...
2.2 The Multichain : not a problem, e.g. tamper-proof records
Figure 5.2: The distribution of probabilities ; more informative please
"If itdoes not pick the peer with the highest score, it moves on to the next peer," feels like an arbitrary choice, lack elegance. "Skip Top Score Probability". "it loops back to the beginning"

We do not yet know workload of a true rewarding cooperation mechanism. Explore basic scalability and cost functions. Explore: storage cost of records for 10k to 1Million blocks?
retrieve or lookup cost latency (single read of latest block of a single peer)?

@synctext
Copy link
Member

synctext commented Apr 26, 2017

msc_thesis___Pim_Veldhuisen (5).pdf
Abstract
'that can achieve this idea in BitTorrent networks' please polish
'The fist step'
'The Multichain system as a whole is not yet completed, but this' make stronger and positive
'1986 Science article Garrett Hardin' 1968
'Figure 3.1: A simplified representation' too small
3.3 Block creation protocol
'Denial of service' ? discuss spam attack to overload network or storage, not overload 1 victim.
100Mbps for $5/month attack ?
4.3.2 Alpha Release Forum link + date
Figure 5.10: 'Load' more discriptive

Chapter 6: how do we conclude this is really correct? verified?
'The record creation is a replay of the trace acquired from the release Each' earlier and clearer: 6.x Experimental setup.
'Blocks over time' discovery speed
node discovery speed, trust graph discovery... Revisits amount...
goal of experiment?
Record discovery experiment using 1000? Tribler instances
Growing interval protocol specification, message layout
'After the record creation process, the experiment is paused for 1 minute, where nothing happens.' why?

Can a link with cooperation be made?

@YourDaddyIsHere
Copy link

@ghost
Copy link

ghost commented Jun 9, 2017

  1. p19 sha256 is introduced without citation

  2. p21-23 re whitewashing -- If someone's client is honestly written but broken, it might accidentally lie. I can't count the number of times that tribler has crashed for me while I tried to get it to upload something (in fact it just did so while I was writing this post) ...at least for me, 99%+ of the time that I'm not uploading something I download it's because of a software bug. The exponential step-up of trust on page 22 is a clever way of dealing with this. It's certainly a blurry line between the cases of 'attacker' and 'broken code' but still, not all lies are intentional - intent should matter when punishing liars in principle. How might this look in practice? If A downloads from B, B has record of A downloading but A "forgets" and continues on as if nothing happened. Would A be a liar in this case and punished for it?

Therefor, an optimal value for φ lies likely around 0.05, depending on the exact characteristics of the network.

φ is dependent upon how important reputation is. If the only purpose of reputation is to encourage the shadow of the future, maybe 0.05 is ok. But depending on the ratio of attackers/buggy clients versus honest nodes, the value of reputation may be different to different network participants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants