Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

READY: Latest TrustChain block via DHT #349

Merged
merged 3 commits into from
Feb 13, 2019
Merged

Conversation

DanGraur
Copy link
Contributor

@DanGraur DanGraur commented Oct 28, 2018

Fixes #369

Added a new endpoint for retrieving the latest TrustChain block via the DHT mechanism. Added a new callback in the TrustChainCommunity class which is called whenever a block is being added in the DB, irrelevant of the block's type. It is now possible to retrieve the latest TrustChain block of some peer via HTTP requests. Created a new test set for the newly added block retrieval mechanism. In addition to this, made a few changes to the REST API test suite in terms of the communication modules, such as moving classes from one module to another and creating new modules which host logic for HTTP requests and communication for specific endpoints.

@DanGraur DanGraur changed the title Latest TC block via DHT READY: Latest TC block via DHT Oct 28, 2018
@DanGraur DanGraur force-pushed the tc_via_dht branch 2 times, most recently from 2e60cd2 to 917e854 Compare October 28, 2018 15:06
@DanGraur DanGraur changed the title READY: Latest TC block via DHT READY: Latest TrustChain block via DHT Oct 28, 2018
@DanGraur
Copy link
Contributor Author

DanGraur commented Oct 28, 2018

I have a couple of questions regarding an aspect of the implementation. In order to have the latest TC block published to the DHT, as soon as it gets added to the database, I added a callback named new_block_cb, which does just this. It is called in two places: here and here. I did notice that there is a notify_listeners method, which calls the listeners attached to a particular type of block. My issue with this is that it gets called on a particular type of block, and not on all types of blocks, which is why I created this new method. I have two questions:

  • Should I remove it, and make use of the notify_listeners method only?
  • If not, then should I maybe move it inside the notify_listeners method, or should I leave it as is (i.e. explicit calls)? I initially decided to not embed it in notify_listeners since I believed that new_block_cb implements a behavior which is very specific to this use case, and might not be as general as notify_listeners.

@devos50
Copy link
Contributor

devos50 commented Oct 28, 2018

@DanGraur You are right that notify_listeners is only triggered for specific block types. I was thinking about adding a wildcard option to notify_listeners so it is triggered by every block.

I would try to avoid adding another type of callback if possible and use notify_listeners instead. However, I'm not sure what the best way would be to achieve this. Maybe we could make it a separate method in the BlockListener abstract class (like received_block)? Then name it something like added_database_block?

@qstokkink
Copy link
Collaborator

I would also vote for making use of notify_listeners.

@DanGraur
Copy link
Contributor Author

Ok, then I'll make the required changes to make use of the notify_listeners.

@DanGraur
Copy link
Contributor Author

Ok, I removed the explicit callback and instead added a wildcard to the listeners_map as suggested by @devos50. The wildcard is stored in TrustChainCommunity's UNIVERSAL_BLOCK_LISTENER field. It should be noted that in the current implementation the peer will publish the latest block in the TrustChainDB (via the get_latest method in the aforementioned DB class) having its own public key as the block public key. That is, the parameter passed to the callback in the listeners_map is not used, and is not actually the one published. I was wondering if the block passed to the callback as parameter should be the one published to the DHT?

@DanGraur
Copy link
Contributor Author

In this latest commit, I enhanced the latest TC block via DHT publication mechanism, such that a block which is already in the DHT will not be published again under a different (embedded) version. This will increase the cost of adding a block to the DHT (due to the extra overhead of checking for a duplicate block), but will decrease the load on the DHT.

test_classes_list.txt Outdated Show resolved Hide resolved
@DanGraur
Copy link
Contributor Author

I've added the blank line at the end of the test_classes_list.txt.

ipv8/REST/dht_endpoint.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@qstokkink qstokkink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, make sure to still rebase and fix my two comments.
Also, make sure python create_test_coverage_report.py still works.

@qstokkink qstokkink changed the title READY: Latest TrustChain block via DHT WIP: Latest TrustChain block via DHT Nov 2, 2018
@DanGraur DanGraur force-pushed the tc_via_dht branch 2 times, most recently from f7d26d2 to 2c3684b Compare November 2, 2018 23:51
@DanGraur
Copy link
Contributor Author

DanGraur commented Nov 2, 2018

Added the extra blank line at the end of test_classes_list.txt. Swapped "I" for "H" in DHTBlockEndpoint's reconstruct_all_blocks method when unpacking part of the block. Rebased the branch, and made sure that python create_test_coverage_report.py still works (it does both for Python 2 and Python 3). Also squashed the commits.

@qstokkink qstokkink changed the title WIP: Latest TrustChain block via DHT READY: Latest TrustChain block via DHT Nov 3, 2018
qstokkink
qstokkink previously approved these changes Nov 3, 2018
Copy link
Collaborator

@qstokkink qstokkink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@xoriole
Copy link
Contributor

xoriole commented Nov 5, 2018

Should there be an additional check that a peer can only publish its latest TC block only under its own public key and not on somebody else key? Similarly, validation of the received blocks under the DHT key.

Other than that, seems good.

@DanGraur
Copy link
Contributor Author

DanGraur commented Nov 9, 2018

Sorry for the late response. @xoriole I think adding the validation should be easy enough, but for ensuring that a peer will only published by a node under its own PK, I'm not particularly sure what strategy to employ. Also, it should be mentioned that currently, the TrustChain block is published under the peer's mid, rather than PK. I initially implemented the publishing operation using the PK, but decided to change it to mid. Should I change this back? Also, should this check be implemented directly in the DHT mechanism (more specifically, in the on_store_request method in the DHTCommunity class), such that other peers receiving the blocks will ignore the requests if the mid does not match with the source address' mid?

@egbertbouman
Copy link
Member

egbertbouman commented Nov 9, 2018

@DanGraur Some comments regarding the DHT. The DHT uses 20-byte keys, so I'd definitely stick to using mid's.

Currently, any peer can store information under whatever key they want. I'm not sure about changing this since other features rely on the DHT as well.

The DHT does offer the option to sign values before storing them in the DHT. When this feature is used, and the signed value is stored under the mid of the signing peer, it will get a higher priority when returning to value (i.e., it will always be at the top of the results list). However, using this feature only one signed value can be stored under a particular key at any given time. Since you're splitting the block over multiple DHT entries, this feature can't be used. Is there a reason to prefer storing the entire block instead of, for instance, just the SHA-1?

@DanGraur
Copy link
Contributor Author

DanGraur commented Nov 9, 2018

I think it is still possible to make use of the PK, since the code I've written produces an SHA-1 hash of the raw key (which was previously the mid concatenated with some special prefix, so that it doesn't collide with some other key using perhaps just the mid, either now or in the future). Since the digest size is 160 bits I think it should be fine even if the public key is used as well.

I've implemented the code to publish the entire block (in chunks), since that was my initial understanding of the task, i.e. the requesting peer wants to get the block itself and reconstruct it via the DHT. I think a final goal of this task (not of this particular PR, though), is to make it possible to get the entire TrustChain via the DHT, although this, in my opinion, will impose quite a lot of strain on the DHT, due to the possibly large size of the TrustChain. I think, technically, my implementation can to a certain extent do that (or at least get the last few published blocks, which haven't been discarded by the DHT due to staleness). That being said some more code needs to be written to get the last few blocks, and not just the last one.

As far as the mechanism for signing multiple published blocks goes, I've started writing (and I think I'm nearly done) some code which signs the block's chunks using the publishing node's private key. The retrieving node, will then use the public key of the publishing node, to ensure the chunks are faithful. This is why I'm starting to think that the public key might also be a good idea for both publishing and getting the block from the DHT, since it's use is two fold: for computing the key, and validating the block's chunks. Also, this validation mechanism also solves the issue of ensuring the block is indeed published by the true node, since the true node will be the only one with that particular SK. What might be an issue is that this won't stop a malicious node from still publishing fake TrustChain blocks under some other peer's public key. These blocks won't be valid, but they will incur some (spatial) strain on the DHT. This is however a general problem, I think, since it is possible to currently publish data under any key, without many restrictions. Perhaps a future task would be to allow a node to publish data to the DHT under a specific key, which will then be 'restricted' just to it, i.e. other nodes can get data under that key, but not publish under it. This solution could be general though, and not just for this use case.

@egbertbouman
Copy link
Member

I didn't realize by PK you meant the hash of the public key. You're right, you can use any 20 byte key you'd like.

Publishing the latest block on the DHT is likely to already introduce a significant train on the DHT, since the latest block will change frequently. Having said that, the mere idea of having the complete chain on the DHT terrifies me.

I agree, signing the block-pieces yourself is the best way to do this at the moment. Your idea of restricting write access to certain keys has a similar effect to that of the signed-value feature already in place (i.e., you restrict write access to the first result).

@qstokkink
Copy link
Collaborator

@egbertbouman as you are the author of the DHT community, does this PR need any changes before merging?

@DanGraur
Copy link
Contributor Author

I still have to push a new commit which adds validation to the chunks of the block being published to the DHT. Also, I wanted to mention this earlier, but I added a method in the test for this PR which expands the last_queries dequeue to around 20 requests. The reason why I did this, is that I noticed that when publishing two blocks, a chunk of the second block was being dropped due to an excessive amount of requests in the recent past from the publishing peer. To counteract this, I increased the dequeue (only in that given test) so that requests won't be dropped due to the permissive maxlen of the dequeue.

I still have to fix another test, since the changes I made by adding verification, and using the PK instead of the mid meant that some of the tests would break. I'll push this commit later tonight hopefully.

@egbertbouman
Copy link
Member

Everything looks good! One thing that maybe we could add to the DHT in the future is the ability to store multiple values at once. This would avoid superfluous find/store-requests, and probably also stop the need to change the maxlen of last_queries while testing.

@DanGraur
Copy link
Contributor Author

DanGraur commented Nov 14, 2018

Sorry for the long delay. I pushed a new commit which does the following:

  • Adds signatures to block chunks published to the DHT.
  • Checks the signatures of the read chunks when reconstructing a block. If a block chunk fails to be verified, the PK's verify() method will throw an exception. There currently is no logic which catches and handles that exception. Should I add this?
  • The PK will be used for publishing a block to the DHT, as opposed to the mid. The PK is suffixed with b'_BLOCK', and then hashed to obtain the key. The PK is also used for verification.
  • Block chunks are now published as a new payload type called DHTBlockPayload (here is its implementation). I feel this is better practice, since future changes can mostly be made in one place. However, this has a downside: constructing such a payload and then serializing it, as opposed to the simple concatenation operations used before, incurs a greater overhead. In one of the unit tests (here), I even had to add a self.sleep() call after a block was published to the DHT, since not all chunks would be properly published and disseminated before the request for the latest block came in, and thus would sometimes construct a partial block (missing a few bytes at the end, i.e. its last chunk).

@DanGraur DanGraur changed the title WIP: Latest TrustChain block via DHT READY: Latest TrustChain block via DHT Nov 16, 2018
@qstokkink
Copy link
Collaborator

@devos50 could you review this? (Specifically the TrustChain changes)

devos50
devos50 previously approved these changes Jan 28, 2019
qstokkink
qstokkink previously approved these changes Jan 28, 2019
Copy link
Collaborator

@qstokkink qstokkink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to fix the merge conflict still.

@egbertbouman
Copy link
Member

@DanGraur I just noticed that you're not doing DHT requests, but you're just looking at the local storage:

block_chunks = self.dht.storage.get(hash_key)

Assuming that this is unintentional, you should be using find_values. See SpecificDHTValueEndpoint for an example of how to use that function.

@DanGraur DanGraur changed the title READY: Latest TrustChain block via DHT WIP: Latest TrustChain block via DHT Feb 8, 2019
DanGraur added 2 commits February 10, 2019 18:54
…he DHT mechanism. Added a wildcard listener in the listener_map (of the TrustChainCommunity class) whose associated listeners are called regardless of block type. This wildcard is used to publish newly added to the TrustChainDB blocks to the DHT. The callback for this mechanism also defines behavior which disallows the same block from being published twice to the DHT. It is now possible to retrieve the latest TrustChain block of some peer via a HTTP request. Created a new test set for the newly added block retrieval mechanism. In addition to this, made a few changes to the REST API test suite in terms of the communication modules, such as moving classes from one module to another and creating new modules which host logic for HTTP requests and communication for specific endpoints.
…he node's secret key. In addition to this, this signature will be used on the receiving end for verifying the faithfulness of the chunks and, implicitly, of the block. Chunks will also be published under a new type of payload class, called DHTBlockPayload. The public key will be used (together with a special suffix) for publishing a block to the DHT (it should be mentioned that the raw key will be hashed before publishing).
@DanGraur DanGraur dismissed stale reviews from qstokkink and devos50 via 2b998c6 February 10, 2019 18:10
@DanGraur DanGraur force-pushed the tc_via_dht branch 2 times, most recently from 2b998c6 to 9ca7075 Compare February 10, 2019 18:22
@DanGraur
Copy link
Contributor Author

DanGraur commented Feb 10, 2019

@egbertbouman thank you very much for noticing this! It was indeed a mistake. I fixed it now, and replaced it with find_values calls, which are then handled asynchronously. I've also dealt with the merge conflicts. I'll clean up the commit history after the changes in this last commit are approved.

@DanGraur DanGraur changed the title WIP: Latest TrustChain block via DHT READY: Latest TrustChain block via DHT Feb 11, 2019
…BlockPayload. Removed the dht.storage.get and replaced them with dht.find_values calls, which are handled asynchronously. Finally, made a few changes and additions to the test_dht_endpoint module.
@qstokkink
Copy link
Collaborator

Do you have updated Gumby graphs for this new find_value fix?

@DanGraur
Copy link
Contributor Author

As far as the Gumby experiments which I wrote and forwarded as PRs here and here, both of these were using the find_value call where relevant already, so the graphs shouldn't have changed due to this fix. If you're referring to a dedicated test case for this PR, then I cannot say how the graphs should look yet, since I haven't written a test for this. I can try to write one if you'd like.

@qstokkink
Copy link
Collaborator

@DanGraur that's fine then, if only this PR had the find_values bug.

@DanGraur
Copy link
Contributor Author

Ok, I'll leave it as it is then.

Copy link
Collaborator

@qstokkink qstokkink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devos50 since this affects the TrustChain, I'll wait for your formal approval before merging this.

@devos50
Copy link
Contributor

devos50 commented Feb 13, 2019

retest this please

@qstokkink qstokkink merged commit 71de9e9 into Tribler:master Feb 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants