Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendezvous Certificates #7517

Closed
wants to merge 9 commits into from

Conversation

InvictusRMC
Copy link
Member

This PR adds rendezvous certificates to Tribler. This serves as a method for determining the online time of fellow peers. The initial certificate piggybacks on the introduction message of the PopularityCommunity. Subsequent rendezvous pings are served through separate payloads. The implementation does not use a separate community in order to reduce overhead.
The design works as follows: Peer A sends a ping message including a nonce serving as a challenge. Peer B returns the signed nonce. Peer A then increases its counter for Peer B. Online time can be estimated by multiplying the number of pings by the interval between pings.

@InvictusRMC InvictusRMC requested a review from a team as a code owner June 28, 2023 11:43
@InvictusRMC InvictusRMC requested review from drew2a and removed request for a team June 28, 2023 11:43
@@ -22,13 +24,16 @@ async def run(self):
metadata_store_component = await self.require_component(MetadataStoreComponent)
torrent_checker_component = await self.require_component(TorrentCheckerComponent)

rendezvous_db = RendezvousDatabase(db_path=self.session.config.state_dir / STATEDIR_DB_DIR / PopularityCommunity.RENDEZVOUS_DB_NAME)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this require a separate Component instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's perfectly acceptable to create a database here. However, don't forget to close the database if it's necessary.

if self.knowledge_db:
self.knowledge_db.shutdown()

Copy link
Collaborator

@drew2a drew2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, congratulations on your first contribution to the Tribler codebase!

Regarding the PR — it looks great. I've only reviewed the community part so far and we've already discussed some points offline. I assume that addressing these points could lead to changes in the code, so I will return to the review process later (please request a review when you are ready).

Besides that, I have a comment about defining the database. You've used the old "metadata"-style approach to define separate entities, which is as follows:

        self.MiscData = misc.define_binding(self.database)
        self.Certificate = certificate.define_binding(self.database)

I could be mistaken, but it seems clear that this approach may not be convenient for the end-developer. We have transitioned to a different approach, as utilized in KnowledgeDB. Please see the following:

self.instance.bind('sqlite', filename or ':memory:', create_db=True)

@kozlovsky am I right regarding the DB definition?


def __init__(self, *args, torrent_checker=None, **kwargs):
community_id = unhexlify('9aca62f878969c437da9844cba29a134917e1649')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you certain that altering the community ID is necessary? As I understand it, this change results in a fork in the Popularity Community.

Comment on lines 66 to 67
self.register_task("ping_rendezvous", self.ping_rendezvous,
interval=PopularityCommunity.PING_INTERVAL_RENDEZVOUS)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I believe it might be slightly better to use 'self' here, as well as in the line above (which I understand isn't your change). Using 'self' offers more flexibility for testing (this is preferable because it allows you to alter its value without meddling with the class namespace, which could potentially affect other tests.):

Suggested change
self.register_task("ping_rendezvous", self.ping_rendezvous,
interval=PopularityCommunity.PING_INTERVAL_RENDEZVOUS)
self.register_task("ping_rendezvous", self.ping_rendezvous,
interval=self.PING_INTERVAL_RENDEZVOUS)

An example:

class A:
    NUMBER = 1

    def __init__(self):
        print(A.NUMBER)


class ChildA(A):
    NUMBER = 2


class B:
    NUMBER = 1

    def __init__(self):
        print(self.NUMBER)


class ChildB(B):
    NUMBER = 2


A()  # prints 1
ChildA()  # prints 1
B()  # prints 1
ChildB()  # prints 2

While the current design is functional, I believe it's not the most user-friendly as it doesn't provide an intuitive means for customizing the class's behavior through variables. I propose a slightly modified version which might be a bit more optimal:

class PopularityCommunity(RemoteQueryCommunity, VersionCommunityMixin):
    ...

    PING_INTERVAL_RENDEZVOUS = 60  # seconds
    def __init__(self, *args, torrent_checker=None, rendezvous_db=None,
                 ping_rendezvous_interval: float = PING_INTERVAL_RENDEZVOUS,
                 **kwargs):
        ...
        self.register_task("ping_rendezvous", self.ping_rendezvous,
                           interval=ping_rendezvous_interval)

For instance, in the tests, you could easily pass a custom value to the class as follows:

        community = PopularityCommunity(
            self._ipv8_component.peer,
            self._ipv8_component.ipv8.endpoint,
            Network(),
            ping_rendezvous_interval=0.1
        )

@@ -35,22 +42,98 @@ class PopularityCommunity(RemoteQueryCommunity, VersionCommunityMixin):
GOSSIP_POPULAR_TORRENT_COUNT = 10
GOSSIP_RANDOM_TORRENT_COUNT = 10

community_id = unhexlify('9aca62f878969c437da9844cba29a134917e1648')
PING_INTERVAL_RENDEZVOUS = 60 # seconds
RENDEZVOUS_DB_NAME = 'rendezvous.db'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: it might be better to declare this variable within the component to align with the current methodology of separating responsibilities.

Components are more specific; they can interact with particular databases, specify particular file names, etc. On the other hand, communities are more abstract, and databases and other instances should ideally be passed to them.

Consequently, we would have two instances where communities are created:

  1. During runtime, they are created by Components.
  2. During test execution, they are created by the TestBase provided by ipv8.


# Init version community message handlers
self.init_version_community()
self.rendezvous_cache = RendezvousCache()

def send_introduction_request(self, peer):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the logic for rendezvous_request could be relocated to the on_introduction_response function. This could simplify interactions by eliminating the need to extend introduction_request. As a bonus, this change might allow us to retain the previous community ID.

    def on_introduction_response(self, peer, dist, payload):
        super().on_introduction_response(peer, dist, payload)
        ...  # preform rendezvous_request

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach is still compatible! Older peers will just ignore the extra bytes. I could drop this entire extra logic though. We can get it to work through only separate payloads.

Comment on lines 258 to 260
else:
# This nonce has been burned.
self.rendezvous_cache.clear_peer_challenge(peer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codacy is right :)

Suggested change
else:
# This nonce has been burned.
self.rendezvous_cache.clear_peer_challenge(peer)
# This nonce has been burned.
self.rendezvous_cache.clear_peer_challenge(peer)

if not certificate:
certificate = self.rdb.Certificate(public_key=peer.mid, counter=0)
certificate.counter += 1
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary return :)

Suggested change
return

@@ -22,13 +24,16 @@ async def run(self):
metadata_store_component = await self.require_component(MetadataStoreComponent)
torrent_checker_component = await self.require_component(TorrentCheckerComponent)

rendezvous_db = RendezvousDatabase(db_path=self.session.config.state_dir / STATEDIR_DB_DIR / PopularityCommunity.RENDEZVOUS_DB_NAME)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's perfectly acceptable to create a database here. However, don't forget to close the database if it's necessary.

if self.knowledge_db:
self.knowledge_db.shutdown()

Copy link
Member

@synctext synctext left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, congratulations also from my side on one of those rare phd code contributions to Tribler! Much appreciated 🚀 🥇 🚀

Goals is said to be: determining the online time of fellow peers. Reading through the code I got inspiration for an "algorithm 1" type of innovation we require for solid publications. return RendezvousCertificate.get(public_key == pk).count() this code calculates the count of rendezvous certificates. In future, use an "algorithm 1" type of approach to calculate the probability of this identity being a Sybil, given the volume, age, and IPv4 diversity of the "rendezvous DAG". This goes a bit beyond MeritRank or is equal to meritRank? Should be N log N complexity.

@InvictusRMC
Copy link
Member Author

InvictusRMC commented Jul 3, 2023

Great idea! This first version is just datacollection. This scoring will go beyond MeritRank as it will require multiple dimensions. The idea for this is as follows: run MeritRank on the metrics, individually, to achieve a score for each metric. Next we introduce implementation specific weights to converge all scores to a single score.

@xoriole xoriole requested a review from qstokkink July 6, 2023 08:05
@kozlovsky
Copy link
Collaborator

kozlovsky commented Jul 6, 2023

Besides that, I have a comment about defining the database. You've used the old "metadata"-style approach to define separate entities, which is as follows:

        self.MiscData = misc.define_binding(self.database)
        self.Certificate = certificate.define_binding(self.database)

I could be mistaken, but it seems clear that this approach may not be convenient for the end-developer. We have transitioned > to a different approach, as utilized in KnowledgeDB. Please see the following:

self.instance.bind('sqlite', filename or ':memory:', create_db=True)

@kozlovsky am I right regarding the DB definition?

I think the current approach used in this PR has some benefits; it allows PyCharm IDE to understand the type of expressions like self.Certificate.

In the KnowledgeDatabase, all entities are defined inside a single define_binding method. This way it is possible to use less number of files, as all entities are defined in a single file. The drawback is that PyCharm can't deduce types of expressions like self.instance.StatementOp.

If you want to combine the benefits of both approaches, you can define all entities in a single RendezvousDatabase method called from the __init__, return a tuple of entity classes as a result value, and assign them as fields of the RendezvousDatabase:

class RendezvousDatabase:
    def __init__(self, db_path: Union[Path, type(MEMORY_DB)]):
        self.database = Database()
        self.Certificate, self.MiscData = self.define_binding()
        ...
    def define_binding(self):
        class Certificate(self.database.Entity):
            ...
            
        class MiscData(self.database.Entity):
            ...
            
        return Certificate, MiscData

Then, all entities can be defined in a single file (if it is considered beneficial), and PyCharm understands the types of expressions like rdb.Certificate

Comment on lines 11 to 12
def get_count(cls, pk: bytes) -> int:
return RendezvousCertificate.get(public_key == pk).count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the method is currently not used. It can probably be deleted.

If you want to keep the method, it should be fixed, as the code is incorrect: RendezvousCertificate.get(...) call returns a single certificate object, and a single certificate object does not have the count() method.

The correct code (if it is necessary) should probably looks like:

        def get_count(cls, pk: bytes) -> int:
            certificate = RendezvousCertificate.get(public_key=pk)
            return 0 if certificate is None else certificate.counter

Comment on lines 263 to 264
with db_session:
certificate = self.rdb.Certificate.get(public_key=peer.mid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid possible database locking, it is better to either use here db_session(immediate=True) or Certificate.get_for_update(public_key=peer.mid) (the result is the same)

@qstokkink qstokkink removed their request for review July 6, 2023 11:44
@synctext
Copy link
Member

@InvictusRMC Can you address the comments and requested changes? {you will find out I guess that "production code polishing" is like eating oatmeal, brushing teeth, etc. 🤣 }
Please inquire about the release scheduling with @drew2a. We're stabilising for many months for a Tribler release. Your code would be the exclusive focus of the next-release possibly.

@synctext
Copy link
Member

btw Advise of @qstokkink is to test the impact on performance using minimal test Gumby experiment. A single UDP message triggering a database write/commit is scary 😨 Something we had in 2013 Dispersy times

@InvictusRMC
Copy link
Member Author

@InvictusRMC Can you address the comments and requested changes? {you will find out I guess that "production code polishing" is like eating oatmeal, brushing teeth, etc. 🤣 }
Please inquire about the release scheduling with @drew2a. We're stabilising for many months for a Tribler release. Your code would be the exclusive focus of the next-release possibly.

Comments addressed! Thank you for the reminder.

@InvictusRMC
Copy link
Member Author

btw Advise of @qstokkink is to test the impact on performance using minimal test Gumby experiment. A single UDP message triggering a database write/commit is scary 😨 Something we had in 2013 Dispersy times

Discussed with @qstokkink and @kozlovsky and they advised me to implement batching logic, as a large number of transactions would slow things down considerably.

@qstokkink
Copy link
Contributor

After getting a small lecture about MeritRank, my advice is as follows.

  1. Hook up a PeerObserver to ipv8 before starting it (line 63) here:
    ipv8 = IPv8(ipv8_config_builder.finalize(),
    enable_statistics=config.ipv8.statistics and not config.gui_test_mode,
    endpoint_override=endpoint)
    await ipv8.start()
  2. Inside your new observer's remove_peer(self, peer: Peer) callback, store time.time() - peer.creation_time for the peer.public_key in a database. This database is probably best managed inside of the Ipv8Component itself.
  3. Future work? Make a new community (or hook into MeritRank code) to share these entries.

@synctext
Copy link
Member

synctext commented Oct 10, 2023

This task is now taking over 4 months. @qstokkink indicated it is possible to re-factor the introduction-request and introduction-response with the ping features required for Sybil attack protection. We need a introduction-response message with both public keys, nounce beyond 16 bits(or OK, no replay attack vulnerability?), and signature.

Related work. This would make Tribler the first academically self-organising system with Sybil protection. See IPFS attack in a USENIX paper, DHT repair blog and DHT health reporting

Let's make the work by @InvictusRMC plus @grimadas the key feature of the upcoming 7.14 release. Preparing for MeritRank Production usage!

@synctext
Copy link
Member

synctext commented Oct 12, 2023

@qstokkink as you pointed out today: no ORM in IPv8 ❌ No database storage. Can you comment here a possible new API which would provide the signed certificates to the IPv8 community. Thus how can we request the rendezvous certificates (Pub-key-Them,Pub-key-OURS,nonce,signature-Them) from IPv8?
We are not trying to slow the network, so the default RATE_RENDEZVOUS_CERTIFICATES == 10 seconds inside IPv8. Meaning maximum 1 (new?) random certificate per 10 seconds (or some other simple rate limit mechanism).

@InvictusRMC
Copy link
Member Author

@qstokkink rebased this into the ipv8 module of Tribler: #7630. Thank you for picking up the slack 🙏. Closing for now.

@drew2a
Copy link
Collaborator

drew2a commented Oct 20, 2023

Please be aware that #7630 is not a rebase of #7517; rather, it's an entirely distinct PR.

@qstokkink
Copy link
Contributor

The commit e99dba8, which is part of #7630, is the rebase of #7517.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants