Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental update of trust levels in a dynamic blockchain graph #2805

Open
synctext opened this issue Feb 17, 2017 · 49 comments
Open

Incremental update of trust levels in a dynamic blockchain graph #2805

synctext opened this issue Feb 17, 2017 · 49 comments
Assignees
Milestone

Comments

@synctext
Copy link
Member

synctext commented Feb 17, 2017

Within Tribler we aim to calculate and show trust level of our neighbors.

Trust levels evolve over time, as more information is coming in from our blockchain. Restarting the calculation from cratch every time new information comes in is prohibitively expensive.

We require an incremental algorithm which can update our sybil-resilient random walk algorithm. With our dataset from our deployed network it currently takes 6-9 seconds to do a full calculation. When we have a dataset locally of 50000 peers around us, we require an incremental algorithm.

Key related work by Stanford and Twitter engineers "Fast incremental and personalized PageRank":

For global PageRank, we assume that the social network
has n nodes, and m adversarially chosen edges arrive in a
random order. We show that with a reset probability of
!, the expected total work needed to maintain an accurate
estimate (using the Monte Carlo method) of the PageRank
of every node at all times is O( n ln m
!2 ). This is significantly
better than all known bounds for incremental PageRank.
For instance, if we naively recompute the PageRanks as
each edge arrives, the simple power iteration method needs
Ω( m2
ln(1/(1−!)) ) total time and the Monte Carlo method needs
O(mn/!) total time; both are prohibitively expensive. We
also show that we can handle deletions equally efficiently.

No decent Python implementation of incremental PageRank seems to exist.

@devos50
Copy link
Contributor

devos50 commented Feb 21, 2017

@Peiteng
Copy link

Peiteng commented Mar 1, 2017

http:https://www-sop.inria.fr/members/Konstantin.Avratchenkov/pubs/mc.pdf
This paper introduces the Monte Carlo Methods in PageRank value calculation

http:https://crad.ict.ac.cn/CN/abstract/abstract3238.shtml
This paper doesn't need to store the random path when updating the PageRank Value, compared with the paper: "Fast incremental and personalized PageRank"
The following picture indicates how the edges are added or removed
image

image

This is the architecture of the algorithm, every time when the graph is changed, we just need to update the value of VT(x), x belongs to all nodes in the graph, according to the Variation of Graph and the PageRank Value in last time.

@synctext
Copy link
Member Author

synctext commented Mar 1, 2017

General research description:
This project will address the problem of trust. Using an experimental-driven approach with real-world datasets and Internet-deployment we aim to significant advance our methods to build trust. First aspect to investigate is taking existing algorithms and determine their viability into a self-organizing systems environment, without central control or oversight. Next steps are obtaining real-world datasets and using techniques from the emerging Big Data field, graph sampling, random walks, and graph processing in general. The desired outcome is a computational efficient algorithm to calculate trust in dynamic graphs with real-world validation.

Excellent work by Harvard University + Zurich: Personalized Hitting Time for Informative Trust Mechanisms Despite Sybils ( thesis version )
Please also read the foundations: https://github.com/dmvaldman/library/tree/master/special%20topics/p2p
Graph sampling: http:https://dl.acm.org/citation.cfm?id=1150479
Study this Java code implementation of the 2010 Stanford Twitter algorithm: https://launchpad.net/jfppr

@synctext
Copy link
Member Author

synctext commented Mar 28, 2017

First code: https://github.com/Peiteng/Incremental-Pagerank !
first_graph_incremental_pagerank
ToDo next meeting:

  • generate 1 random graph
  • change the network 5%, 10%, 25%, 50% etc.
  • calculate your incremental pagerank
  • how accurate is it? (re-calculate)
  • repeat above for numerous random graphs

Key outcome plot :

  • X-axis percent change in graph
  • Y-axes accuracy of incremental pagerank

@synctext
Copy link
Member Author

synctext commented May 9, 2018

Possible solo "blockchain Engineering" course. ToDo: take trustchain dataset, copy algorithms from state-of-the-art, possible enhance, implement Python code, math beautiful equations on probability & stochastic. @alexander-stannat

We establish a surprising connection between the personalized PageRank algorithm and the stochastic block model for random graphs, showing that personalized PageRank, in fact, provides the optimal geometric discriminant function for separating the communities in stochastic block models over a wide class of functions. Building on this result, we develop stronger classifiers that, although scalable, are competitive with computationally much more demanding methods such as belief propagation.

@synctext
Copy link
Member Author

synctext commented Jun 21, 2018

First sprint:

@alexander-stannat
Copy link

First sprint finished.

I implemented a standard Monte Carlo random walk algorithm (R random walks starting from each node without resets, with average length reset_prob^-1). The page rank is determined by the number of visit times of each node divided by the sum over all visit times (see algorithm 4)

For nodes that are added to the graph we run R random walks starting at the corresponding nodes.
For edges that are added to the graph we use the visit times of the node out of which the new edge goes and determine the number of random walks which should pass through the newly added edge. Then we run the given number of random walks starting at the node the new edge connects to and add the visit time of each node to the existing visit times vector.
For edges that are removed from the graph we do the same but subtract the visit times of each node from the visit times vector.

I tested the algorithm on a number of different example graphs, including the graph of the multichain data set and determined the execution times.

For the graphs generated by blocks 0-100 and 50-150, it took 24.5 seconds to compute the algorithm.
incrpagerankfor100blocks
For the graphs generated by the blocks 0-1000 and blocks 500-1500 respectively, it took 72 seconds to compute the page ranks of both the new and the old graph.
Unfortunately, I couldn't run the algorithm on the entire data base, because Spyder froze.

The next step would now be to optimize the algorithm. These guys divide the graph into a partition Q containing all changed nodes/edges and no outgoing links, and a partition P. Seems like the right direction seeing as the multi chain network only grows in 'one direction' making such a partition easy to obtain.

I couldn't find an English version of the Chinese paper. so I don't quite know what to do with it.

@vandenheuvel
Copy link

@vandenheuvel
Copy link

vandenheuvel commented Aug 1, 2018

Feedback in terms of coding:

  • Write comments
  • Use a unit testing framework
  • Separate code into multiple modules, by function (e.g. move out the database code)
  • Follow PEP8

Function wise changes:
Compute only with totals
Don't use the block graph directly, aggregate blocks for each public key. Either use contribution totals directly as edges, or scale them first (unclear how this should happen: capped at a limit, logarithmically, etc). Possibly, one could also use net contribution, and not have an edge at all if the net total is negative.

All walks from a single node
When using this algorithm in practice, there is only one node known to be secure: yourself. Take a random node and compute all values from the perspective of that node.

Continuous updating
A mechanism needs to be in place, which updates computed values once a new block arrives. Either using the saving of all walks and replacing them, or updating parts of walks as discussed, or use a vector of computed values.

Think about churn, low priority
Maybe, many public keys will never come online again. Maybe, these are best removed from the graph.

Think about speed of convergence
Matters a lot, especially if the computed value will determine its updated value after a new edge arrives.

Think about how to set up sybil resistance experiments
The end goal is sybil resistance. Could you create test graphs with sybil regions? Can we measure their scores, and in this way test resistance?

@alexander-stannat
Copy link

I have written a class to incrementally compute the personalized page ranks of a directed graph from the perspective of a predetermined node in the graph. Personalized page rank is an alternate version of page rank where the ranks of nodes are determined by their distance from a given seed node. In this class the page ranks are computed using the monte carlo method, whereby a number of random walks of predetermined lengths originate from the seed node and walk along the edges through the graph. They jump back to the seed node with a given probability at every step and the walk is terminated once it reaches a certain length. If a random walk reaches a "dangling node", i.e. a node with no outgoing edges it is reset as well. A vector of visit times is computed containing the number of times the random walk passes through the individual nodes and the page rank is given by the visit times divided by the accumulated visit times of all nodes in the graph.

The personalized page rank given below is incremental, meaning that it can be recomputed every time the underlying graph structure is modified, i.e. edges and nodes are added or removed. In order to recompute the page ranks, one doesn't have to recompute the entire set of random walks through the graph. Instead, the given set of random walks is modified. The idea behind this is that a given random walk does not pass through every single edge. Only random walks that reach a node for which the outgoing edges have been modified, i.e. an edge is added or removed, or the weight of an edge is changed, need to be recomputed, starting from the first node for which such changes have occurred. I compare my results to the regular power iteration method whereby i used 500 as a maximum number of iterations. The power iteration method is more accurate than the monte carlo mehtod. However, the goal was to implement an incremental pagerank. Seeing as recomputing the power iteration every time the graph changes, the monte carlo algorithm turns out to be the more efficient option. As the graph sizes increases will become more efficient than the power iteration.

I created some unit testing examples where graphs are randomly generated and the page ranks are computed (both monte carlo and power iteration) and then randomly modified, i.e. edges and nodes removed and added. Finally the page ranks are computed again. Once incrementally (monte carlo) and once by power iteration. I assert that both vectors of page rank values are approximately equal.
Graph of randomly generated network
I found out that the accuracy of the page rank vectors depends largely on the length of the random walks and not on the number of random walks generated. The graphs below show the euclidean distance between the two page rank vectors by length and number of random walks:
convergence of random walk length seed 111
convergence of number of random walks seed 111

We can see that the distance converges approximately to zero as the length of the walks increases and a walk length ca. 80 is very accurate. We also see that the number of random walks plays an unimportant role.
Finally I have written a class that opens the database of trustchain transactions and generates a graph, where nodes correspond to public keys (peers) and the weights of edges correspond to the net data flow in between two peers. I ran the monte carlo algorithm on this data set and am now able to efficiently compute updates made to the trustchain database, incrementally.

Finally Martijn gave me access to the new data set which I will incorporate into the algorithm.

@vandenheuvel
Copy link

Make the implementation scale
Available memory seems to be a bottleneck. Re-examine portions of your code on which a lot of time is spent; the goal is to handle the entire dataset. Use generators, delete statements and if necessary, move some of the preprocessing tasks to sqlite.

Normalized error value
Make sure error values over different database sizes are comparable. I recommend using an average relative error (relative to the power iteration value).

To cap or not to cap the walk length
Two ways to do the walking:

  1. Stopping and resetting walks probabilistically with a fixed probability
  2. Stopping walks after a fixed number of steps and resetting probabilistically with a fixed probability

We suspect the first one approximates typical personalized PageRank as implemented by NetworkX, while the second one results in slightly different values. The may explain the asymptotic behaviour of the error as measured in the above post: when the cap on the walk length increases, the second method approaches the first method.

In the next sprint, also pick up the first method and compare.

Sybil resistance
Make artificial test cases for sybil resistance. Create one graph without a sybil region, and then compare with graphs which do have one. Vary in how connected the trusted node is to the region (as measured by shortest path, or average path lengths of the walks), as well as its size. Also, test whether extreme transaction sizes influence the rankings significantly.

For completeness, include a quick comparison of the NetworkX PageRank and NetworkX personalized PageRank.

Completing the picture
Reflect on whether these results make sense when comparing with the relevant literature. Then, consider the following questions:

  • Approximately, which database sizes is your algorithm (leaving aside practical io issues) able to handle? How long does processing take?
  • Can you tell if the algorithm is sybil resistant? And is there an theoretical explanation for that?
  • Which next steps should be taken, working towards closing this ticket?

@synctext
Copy link
Member Author

synctext commented Sep 6, 2018

We proposed the EscapeLimit algorithm in the past, there are multiple random walks:

@alexander-stannat
Copy link

alexander-stannat commented Sep 6, 2018

Sybil Resistance:

I simulated a random graph of 500 nodes with a Sybil region of 50 nodes and a range of attack edges from 0 to 50. For each number of attack edges I compute the ROC curve and determined the area under the ROC curve as a measure of the sybil resistance of the algorithm.

I also computed the proportion of false positives and false negatives for each number of attack edges and obtain the following results:

area under roc curve
proportion of false negatives
proportion of false ositives

@synctext
Copy link
Member Author

synctext commented Sep 6, 2018

Please no random walks of 100 hops! Sane values for reset probability = [15%, 30%]. Not 1%.
See the empirical grounding and old paper of this and https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon

@alexander-stannat
Copy link

alexander-stannat commented Sep 6, 2018

accuracy by number of random walks variables lengths
accuracy by number of random walks fixed length
accuracy by number of random walks variable lengths

@synctext
Copy link
Member Author

synctext commented Sep 6, 2018

Wrapping up with a few pages report in ipynb format (or old-skool 2-column IEEE format paper).

  • focus on short random walks
  • 500 honest nodes, 1000 densely connected attackers, 0-500 attack edges which are randomly attached.
  • Reset probability = .5, .3, .2, .1 ?
  • ER model to connect honest nodes, 10k edges.
  • runtime cost! one iteration
    • say 5 new nodes and 5 x 20 new edges are added
    • this is a single 'update batch'
    • measure how many milliseconds it takes
  • performance analysis, CPU usage versus accuracy
  • for giant future work section. The "no cpu hogging" requirement, an algorithm should take the CPU continuously. Not be idle for 15min and then crunch full-time for 60 seconds. We need a continuous running light background process.
  • Key Performance Indicator:
Therefore, we evaluate the fraction of walks escaping into the sybil
region when starting at any node in the honest region, which is called the
escape probability ```

@synctext
Copy link
Member Author

Reviewed final project report:

  • URLs broken
  • dropedge: http:https://www.eecs.harvard.edu/cs286r/courses/fall09/papers/tang09.pdf
  • $M=\left<P,I,a,\omega\right>$
  • $$ w((v_p,vq)):=\sum\limits{i\in{}I:a(i)=(p,q)}w(i,p) $$
  • "this agent msut be completely"
  • by $\tilde{E} is a matrix of only zeros, with a column of ones at the i-th place, with $i$
  • Performance analysis
    • simple and easier to understood : probability of escape in Sybil region
    • ROC curves are a powerful and complicated measure

@vandenheuvel
Copy link

vandenheuvel commented Sep 20, 2018

In our discussion on September 6, the following argument was made in favor of small hops:

If you walk with a small reset probability and large average length, you extend your reach to the entire network, which we want to avoid. You'll always reach the sybils. "Six degrees of Kevin Bacon".

While it is true that it becomes possible to reach the entire network, but the probability of reaching one of the nodes which you can reach in n steps but not in n - 1 steps is extremely small. Indeed: the number of nodes reachable becomes enormous, but there is still only one walk. The probability that a far away node is visited many times, is extremely small, both because the walk isn't often very long and because there are so many nodes within reach. Instead, the walks will more often visit the nodes which are closer.

The above doesn't necessarily argue for long walks, but it shows that they can't be dismissed with that argument.

When we're approximating personalized pagerank, only the accuracy and computational cost matters. The incremental method using long walks when approximating personalized pagerank with low reset probability is accurate, see the above experiments by @alexander-stannat. Whether the computational costs are acceptable, still needs some discussion in my opinion (see bottom). If one is interested in nodes which are close by, I think it might be worth considering doing sparse matrix-vector multiplication to approximate.

  1. First phase: A quick connectedness check with a maximum length, to limit the number of nodes for the second phase
  2. Second phase: Build a sparse transition graph for the nodes found in phase one, and do a few sparse matrix-vector multiplications in numpy to find your probabilities

The result will be similar to personalized pagerank with high alpha. Example: Limit the nodes to third degree connections. Choose alpha = 0.5, and do three multiplications. Had we approximated with random walks, only 12.5% of walks could have reached out of our selection, when assuming equal probabilities for every hop. In practice, people will prefer helping those they know, so this number would be significantly smaller in practice (especially after implementing a peer selection algorithm which gives lowest priority to those with a connection degree that is too large).

As a bonus, the Tribler users could be given a simple choice they can understand: "do you only trust the friends of your friends, or also their friends?".

@devos50 could you weigh in on what, from a tribler design perspective, you think is an acceptable computational cost for such an algorithm? The current implementation would use a single core 100% for less than a minute for each update (updates can be accumulated and processed at once). @synctext mentioned that he prefers an algorithm which instead uses the cpu continuously but never longer than a few ms. @synctext what was the reason for this preference? Do you want to keep these computations on the twisted event loop?

@devos50
Copy link
Contributor

devos50 commented Sep 21, 2018

This week, we brainstormed a bit about modularization of Tribler (splitting Tribler up in separate components, like IPv8, VoD, market etc). I think a dedicated 'trust module' would be an excellent first candidate to adhere to a modularized design. As a result, I would like to see this module being executed as a separate subprocess and communicate with the Tribler core using an IPC protocol (like JSON-RPC). Depending on the scheduler in the OS and the number of available cores, this subprocess should have minimal impact on the performance of the main Tribler process and overall user experience.

@devos50 could you weigh in on what, from a Tribler design perspective, you think is an acceptable computational cost for such an algorithm?

If we follow the design I elaborated on in this comment, we have more flexibility regarding trust computations. Yet, it is very hard to come up with reasonable bounds for this. I would suggest playing around with the implementation and get more insight into the scalability and computational requirements of the algorithm first. To answer your question, we would need to know the tradeoff between required computation time and accuracy. I'm not sure whether the performed experiments are sufficient to answer this question yet.

Note that @ichorid can also help you with performance optimizations.

@synctext
Copy link
Member Author

Dynamic trust metrics for peer-to-peer systems, early work, 2005
"One of the fundamental challenges for peer-to-peer (P2P) systems is the ability to manage risks involved in interacting and collaborating with priorly unknown and potentially malicious parties. Reputation-based trust management can mitigate this risk by deriving the trustworthiness of a certain peer from that peer's behavior history."

@synctext
Copy link
Member Author

Filtering trust opinions through reinforcement learning, AI-solves-everything, 2014 work
solution: AI magic + make the problem simpler by assuming a witness can always be inserted into any transaction
"we propose the Actor–Critic Trust (ACT) model, which is an adaptive trust evidence aggregation model based on the principles of reinforcement learning. The proposed method dynamically adjusts the selection of credible witnesses as well as the key parameters associated with the direct and indirect trust evidence sources based on the observed benefits received by the trusting entity.

@alexander-stannat
Copy link

alexander-stannat commented Oct 29, 2018

Update:

Very basic GUI is now implemented. The idea was to implement an interactive graph visualising the Tribler network, including the trust rankings of all agents, determined by our current Monte Carlo pagerank implementation.
The current implementation is still a skeleton of the final product and looks as follows:

tribler network gui

The code can be found here. Ideally, it will resemble the current Tribler network explorer in design and layout.

tribler network explorer

So far, a few features are available, such as finding the highest ranking (most trustworthy) node in the network. This is the node that has the largest number of visit times out of all nodes in the network. However, it may not be the node everyone can or wants to interact with, due to the decline in transitive trust over multiple hops.

Another feature is the "Show Neighbourhood" button with which a node can get a close-up look of all it's adjacent nodes. The aim for the future is to create a GUI, in which the user can click on an arbitrary node in the network and then zoom into that node's respective neighbourhood. That way one could explore the entire Tribler network, along interaction paths.

Finally, we're working on a "Most Likely Path" feature which will highlight the path in the network along which the majority of random walks in the Monte Carlo algorithm have travelled.

For our next steps, we will enhance the Interface's interactiveness, i.e. clicking on nodes and edges reveal information about them. More aesthetically pleasing design and layout and some additional features such as viewing the overall net contribution of nodes to the network, colour-coding trustworthiness, etc. This can be discussed in our next meeting.

@synctext
Copy link
Member Author

synctext commented Oct 29, 2018

Solid progress. Please focus on feature completeness and leave "GUI beauty" to others...

To reduce rendering CPU consumption you can use "DrawCircular". Hard-code that there is 1 node at the center, 1 inner-circle and at most 1 outer-circle. Fill the outer-circle with all 2-hop neighbors of the node we currently view. For simplicity, draw the inner-circle with even spacing, to simplify things for now. But the inner-circle spacing is actually very simple, position in in the middle of the connected outer-circle peers. However, our dataset probably has outer-circle nodes connected by multiple 2-hop paths :-)

@alexander-stannat
Copy link

Report in IEEE format now ready. Now It's back to the GUI.

@devos50
Copy link
Contributor

devos50 commented Nov 12, 2018

Nice!

Side note, are you sure that's IEEE style? Looks a lot like the ACM template to me ;)

@alexander-stannat
Copy link

alexander-stannat commented Nov 19, 2018

I have updated the Trust GUI to include a double-circle spring layout of the nodes in the network. See below
tribler gui 3

A peer can view its immediate neighbourhood (2 hops) in this GUI and determine how trustworthy the nodes in its vicinity are. The nodes are ranked using our previous Monte Carlo PageRank Algorithm and are colourcoded based on their trustworthiness from red to green.

Making the networkx plot of the graph interactive has proven to be challenging and it looks as though this will have to be hard coded, which is rather burdensome. For now, there seems to be no way around this.

Here is an example of how that can be done. Note that this example shows how ineffective such hard-coding is.

Also @devos50, yes it's actually an ACM template, you're right :)

@synctext
Copy link
Member Author

synctext commented Nov 19, 2018

Next sprint: connect to the live network; as of now 400+ MByte of Trustchain data, growing with 80k records daily. Just plot all trustchain data you find in real-time. Keep track of discovered Tribler users by you: "discovered 7500 people".
tribler_users

On making the bitmap image interactive

  • ask @egbertbouman ; dumb @synctext method...:
  • each click has an x1,y1 coordinate
  • each node has an x2,y2 coordinate
  • Magic find node with minimal calculateDistance(x1,y1,x2,y2)
def calculateDistance(x1,y1,x2,y2):  
     dist = math.sqrt((x2 - x1)**2 + (y2 - y1)**2)  
     return dist  
print calculateDistance(x1, y1, x2, y2)  

Focus on real-time updates. Show partly the fresh information that the random walk discovered. For instance,
output

@synctext
Copy link
Member Author

synctext commented Nov 30, 2018

Goal: elegant math model for cooperation within existing spacial relationships (graph distance, network latency or neighborhood; #2541).
image

Cooperators and Defectors. A simple rule for the evolution of cooperation on graphs and social networks. "the following update rule for evolutionary dynamics (Fig. 1): in each time step, a random individual is chosen to die, and the neighbors compete for the empty site proportional to their fitness. We call this mechanism ‘death–birth’ updating". We do not have anything like that in our Tribler micro-economy. Spatial relationships could influence how you vector with policy settings evolves in time; or new installs ('births') obtain setting from the network! For instance, trustchain policies and forgiveness levels.

Thought experiment: the spatial region of The Internet you are born in (install IPv8 latency community) determines who you form relationships with and whom you trust (initial bulk Trustchain record download). You can be born in a bad region or a cooperative blooming area with high-trust.

@alexander-stannat
Copy link

Update:
I have got IPv8 up and running and can now crawl the Tribler network myself. Over the coming days I will run some experiments on my current implementation of our Monte Carlo algorithm, in order to determine optimal incremental update intervals and how often to recompute the PageRanks from scratch. I will document error value accumulation as well as run-time.

@synctext
Copy link
Member Author

synctext commented Jan 11, 2019

please do not be confused.
This related work on game theory completely fails to accurately model the identity problem and Sybil attack. Use it only to scare little children! "Deception, Identity, and Security: The Game Theory of Sybil Attacks"
image
Reasoning: "the receiver R challenges S to provide a more rigorous proof of identity". This model assumes that data exchange on The Internet is costly. That's incorrect I believe, getting reliable information is hard. Furthermore, your input information has unknown reliability.

@alexander-stannat
Copy link

Here is the current state of the report for the deep-dive meeting with the Ministry of Infrastructure tomorrow.

At this point the algorithm is still a standalone implementation based on a static database. Starting February I will begin working on this full-time. The next steps from thereon out will be to:

  • make incremental updates continuous by connecting to a real-time TrustChain crawler
  • integrate the trust ranking into the Tribler GUI
  • optimise in order to reduce run-time and error sizes

@vandenheuvel
Copy link

vandenheuvel commented Jan 18, 2019

@alexander-stannat What exactly is plotted in Figure 13? I think you should invert your metric, because as it is plotted now it looks like your algorithm is performing consistently worse than random guessing.

@alexander-stannat
Copy link

@vandenheuvel The graph in figure 13 corresponds to the area under the ROC curve, a measure for determining the proportion of false negatives/positives of a ranking algorithm. You're right. Our current implementation of the PageRank approximation has a very high proportion of false positives/negatives. The reason for this are the very short random walks which we used in order to mitigate sybil attacks. This leads to nodes that haven't actually been discovered by any random walk being ranked as non-trustworthy (Sybil), because they have a PageRank of 0. This, of course shouldn't be the case, because nodes that haven't been discovered yet shouldn't be labelled Sybils. Instead, only nodes that are in the already discovered part of the graph and have very low trust values should be considered Sybils. Others should just be labelled "unknown". If we do that the accuracy of our ranking should improve dramatically. The graphs therefore need to be updated. Thanks for the feedback 😃

@synctext
Copy link
Member Author

synctext commented Feb 4, 2019

Official start of master thesis

We have several online reading lists on trust. Start with writing a related work section. Focus on multiple domains, then sharpen to gossip of reputation info and Sybil attacks. The material already on this wiki:

@synctext
Copy link
Member Author

Meetings minute: continue intense reading and wrap-up "blockchain engineering" course work.

@cuileri
Copy link
Contributor

cuileri commented Mar 8, 2019

Detected a problem with both Alexander's and my implementation of random walks:

rw_ex

In 1000 random walks initiated by root r

  • c will be visited 990 times (or 990 x (1-p) times, where p is the reset probability)
  • b will be visited 10 times

even though they did the same amount of job.

Winning strategy (Possible attack):
Perform work for the one who performed much work but have not received any work yet.

Solution:

formula

@synctext
Copy link
Member Author

Detected a problem

I would call this a feature: trustworthiness of information is taken into account. In this extreme numerical example it has a nice dramatic effect (10 vs. 990). These are synthetic numbers, the effectiveness of a Sybil attack prevention method should be done with the really-hard-to-get attack datasets.

@vandenheuvel
Copy link

I could consider the node 'b' in the example to be more trustworthy that 'c': they have made the same contribution to this part of the graph, however, from the perspective of node 'r', 'b' is closer.

@synctext
Copy link
Member Author

synctext commented Apr 18, 2019

Large body of related work from the AI agents-community. From 2002 with currently 1607 citations, "the Beta reputation system". Their magic:

image
Belief discounting from millions of fake identities is not part of this foundational work.

@synctext
Copy link
Member Author

synctext commented Apr 23, 2019

Latest 2019 TRUST workshop papers

The AAMAS 2019 TRUST workshop will have the theme: The Whole World is Watching.
The development of multiagent trust and reputation systems has been a vibrant topic of
research at AAMAS conferences each year. This year, the workshop will aim to introduce
our field of study to researchers in related subfields and to industrial representatives who 
may be attending the conference in order to learn more about AI. There has been 
increasing interest worldwide in the use of AI within private and public sector organizations,
and a common concern has been the trustworthiness of the systems being used.

Long running series of workshops from AI multi-agent community: 21 editions! TUDelft faculty in program committee.

@synctext
Copy link
Member Author

2018 "Network Reciprocity" paper, great overview of the big picture, https://doi.org/10.1073/pnas.1707505115

"Experiments show that network reciprocity promotes cooperation under a restrictive condition: The benefit of cooperation must outweigh the total cost of cooperating with all neighbors (39). Our question, therefore, is whether this condition can be relaxed by combining two cooperation-promoting mechanisms. Specifically, are there any synergies to be gained by aiding network reciprocity with costly punishment?"
"In total, 225 participants were recruited to play 50 rounds of the game."
Nice work, that last bit also shows how artificial such lab results are.

@synctext
Copy link
Member Author

synctext commented May 20, 2019

Thesis work moved to the "proximity" Sybil attack mitigation ticket #4481

@synctext
Copy link
Member Author

synctext commented Jul 9, 2020

This issue needs to be promoted and renamed. Contains a wealth of info. How are we going to re-organise all these tickets of past 7+ years?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants