Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

READY: Experiment for DHT dissemination time measurements. #389

Closed
wants to merge 3 commits into from

Conversation

DanGraur
Copy link

@DanGraur DanGraur commented Dec 16, 2018

Added a new test scenario which should capture the time it takes for a data element to be distributed across the peers in a DHT community. A new local test configuration written specifically for this test case, as well as a new scenario. Also added a new R script which parses the data, and generates a scatterplot (of the per node dissemination times) for it. The DHTModule class was also modified in order to be able to parse the generated per node logs, and to accommodate annotations.

@tribler-ci
Copy link

Can one of the admins verify this patch?

@synctext
Copy link
Member

OK to test

@qstokkink
Copy link
Contributor

Ok to test

@DanGraur
Copy link
Author

DanGraur commented Dec 17, 2018

The dissemination time here is measured by a LoopingCall, which samples the local DHT storage at every 1e-4 seconds. Once an entry for the given key is found, the time is stored to local storage, and the LoopingCall is cancelled. If no entry is found, the method will continue to be called, until the experiment time runs out, and the node will have logged that it wasn't able to find an entry for the given key. A sample of the generated graph can be seen here (tested locally with 1 machine and 20 processes):

The dashed vertical lines mean that the peer (to which they correspond, indicated by the horizontal axis) was unable to find an entry for the key, i.e. the entry was not disseminated to them during the time allotted to the experiment. The vertical axis presents the time it took for the entry to be disseminated to a peer. It is measured in milliseconds (the exact time is presented as a label).

@qstokkink
Copy link
Contributor

So it's either really fast or doesn't work at all?

@DanGraur
Copy link
Author

DanGraur commented Dec 20, 2018

Sorry for the late response. My guess here is that the DHT is maybe not designed to distribute the entry to every node in the community, but rather ensure a sufficient dissemination, such that peers that don't have the entry can query nearby nodes for it. In fact, there might be a limit a hardcoded threshold in the codebase here. The limit here is 8, and as we can see in the graph above, there are indeed 8 hits and 11 misses.

Or perhaps there's a confusion of how the test case is implemented? I have a looping call (in each node) which calls the get method in the Storage class (i.e. we look for the entry locally). When there's finally a non-empty result returned, it means this node has been given the entry by the source node. If during the experiment duration the node cannot find the entry in its local Storage, it means it wasn't sent the entry by the source node (of the entry), but that does not mean it cannot call the find method in DHTCommunity. It can, and it will find the entry (given normal operating conditions).

@devos50
Copy link
Contributor

devos50 commented Dec 28, 2018

@DanGraur interesting experiment. Curious to see what the dissemination times are in a network of 2000 nodes running IPv8 on the DAS5 :)

Instead of relying on a LoopingCall that continuously runs and polls the key/value store, I would implement a callback when something is inserted in the store (you can use your own Tribler/IPv8 branch with custom code in a Gumby experiment). This is straightforward if you are running an experiment on your local computer.

I'm not surprised by the low dissemination times since you run the experiment locally, without "real" network traffic and latency. While I do not know the Kademlia protocol by heart, values are stored on nodes closest to a specific key and not necessarily all nodes in the network. Which nodes are storing a specific value depends on the identities that each node in the network has. I would assume that you get different results when you run the experiment again (since the peers are then assigned different identities)?

@devos50 devos50 changed the title Experiment for DHT dissemination time measurements. WIP: Experiment for DHT dissemination time measurements. Dec 28, 2018
@synctext
Copy link
Member

Some quick feedback.. quick insert time is only of moderate interest, but essential that it works.

Can we test scalability? Key/value pairs should not be stored at all nodes, only for small tests. Fast lookup, scalability, and some resilience to churn are the DHT strong points. Do we have tests for that?
Tribler network will soon hit 20k concurrent nodes. Discovery of hidden swarms depends on DHT.

@devos50
Copy link
Contributor

devos50 commented Dec 28, 2018

@synctext IIRC, we don't have any unit test for churn (yet). Fault tolerance would be an interesting experiment but I would suggest focussing on scalability first (work towards an experiment with a few thousand nodes)? Plotting the CPU usage/bandwidth requirements are trivial to do in Gumby 👍

@synctext
Copy link
Member

Agree fully, will be solid progress if we would gave a few thousand node experiment.

@DanGraur
Copy link
Author

DanGraur commented Dec 29, 2018

@devos50 ok, I'll try to get on this as soon as I can. I also have another idea for a test case: the hop count for a DHT lookup. I think this could be interesting. Let me know if you think this is a good idea as well. I'm a bit busy these days, but will try to implement this as well as soon as I can (also, knowing that I can also write some custom code, which wouldn't normally be accepted in the master branch, makes this much easier - this was the reason why I didn't write a callback in the first place, which is what I wanted to do initially -).

@synctext
Copy link
Member

Nice, yes, lookup experiment is great idea with hopcount as key performance indicator together with latency.

@DanGraur
Copy link
Author

DanGraur commented Jan 2, 2019

@devos50 once again sorry for the late response. Indeed, when running the same experiment multiple times I do get different results. For instance, this is the result I get when running the same experiment again.

@DanGraur
Copy link
Author

DanGraur commented Jan 5, 2019

I've implemented the callback functionality here in the DHTModule, and the call itself in one of my fork's branches.

I've also executed the experiment multiple times, and the time measurements generally seem to be lower than the LoopingCall version (there was probably some noticeable overhead due to the LoopingCalls). This change should also make the measurements more precise (previously they were less precise since the measurement methods were called every 1e-4 seconds, and were more computationally intensive).

Below, I've attached below a few diagrams to show the new results.



@qstokkink
Copy link
Contributor

We want to see how this does on the DAS5 as well. @devos50 could you give @DanGraur a job to play around with?

@devos50
Copy link
Contributor

devos50 commented Jan 8, 2019

@qstokkink @DanGraur this job runs a basic DHT validation experiment on the DAS5: https://jenkins-ci.tribler.org/job/validation_experiments/job/validation_experiment_dht/. I suggest looking at this one.

DanGraur added 2 commits April 7, 2019 00:00
…a data element to be distributed across the peers in a DHT community. A new local test configuration written specifically for this test case, as well as a new scenario. Also added a new R script which parses the data, and genrates a scatterplot (of the per node dissemination times) for it. The DHTModule class was also modified in order to be able to parse the generated per node logs, and to accommodate annotations.
@DanGraur DanGraur force-pushed the dht_dissemination_experiment branch from 5fdfd3e to 976baaa Compare April 6, 2019 22:35
…also changed the scenario of the local version such that it now uses a for loop.
@DanGraur DanGraur force-pushed the dht_dissemination_experiment branch from 473c148 to 60bc59d Compare April 8, 2019 10:33
@DanGraur
Copy link
Author

DanGraur commented Apr 8, 2019

I've finally created a Jenkins project to run this experiment at large scale (100 nodes). Here's the link to it: https://jenkins-ci.tribler.org/job/dissemination_experiment_dht/. I think this PR is ready now.

@DanGraur DanGraur changed the title WIP: Experiment for DHT dissemination time measurements. READY: Experiment for DHT dissemination time measurements. Apr 8, 2019
@devos50
Copy link
Contributor

devos50 commented Sep 20, 2019

@DanGraur what is the status of this PR?

@devos50
Copy link
Contributor

devos50 commented Jul 15, 2020

I'm not sure if this will be merged on short term. Will re-open if we work on additional DHT experiments 👍

@devos50 devos50 closed this Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants