Add bulk seeding to multiple peers #1170

KonradStaniec · 2022-07-21T13:07:32Z

Add call to seed multiple content items to multiple peers, effectively it tries as much content as possible from seed db as fits in remote peer radius.

We can consider making it asynchornous i.e installing seed task in portal protocol and exposing some endpoints to check status task as it make take a while in case of large database. Also in such case this can be made a bit more sophisticated as node will discover new peers which could be added to set of seeded peers.

kdeme · 2022-07-21T20:52:10Z

fluffy/populate_db.nim

+
+ offset = offset + batchSize
+
+ proc localNodeWorker(p: PortalProtocol, db: SeedDb): Future[void] {.async.}=


Why is this an async call?

this probably a bit silly, but imo it looks nicer and consistent that way i.e local worker is added to set of gossip workers and awaited on

kdeme · 2022-07-21T20:52:45Z

fluffy/populate_db.nim

+ proc localNodeWorker(p: PortalProtocol, db: SeedDb): Future[void] {.async.}=
+ var offset = 0
+ while true:
+ let content = db.getContentInRange(p.localNode.id, p.dataRadius, batchSize, offset)


Considering that this doesn't go over the network, you can just set a much bigger batchSize?

yep you are are right the only limit is how many results will fit in memory. I will fix it 👍

kdeme · 2022-07-21T21:14:34Z

fluffy/populate_db.nim

+ p: PortalProtocol, seedDbPath: string, maxClosestNodes: uint32):
+ Future[Result[void, string]] {.async.} =
+
+ ## Choses `maxClosestNodes` closest known nodes with known radius and tries to


So what this does is select a a set of nodes, that is, the closest neighbours of the node, of which it also holds the radius.
Then it offers, per node, all content it is potentially interested in, in batches of 64. Correct?
Downsides I see with this approach:

It again does not test / reuse code of the actual neighborhood gossip mechanism. (Same remark as previously added json rpc call)

It will not seed all content of the SeedDb into the network, especially considering it only targets the own node's direct neighbours. For this to work, one would need to run this on a node in each "neighborhood" (it will be difficult to define how big this is). That does not sound feasible.

It is important when adding this functionality to think about how it can be used to seed all data into the network.

The way I had in mind of implementing this is as follows:

By means of getContentInRange select content in batches of 64 and each time increase the target id until you have covered the whole id space. (Might have to tune the radius here, or have some way of selecting them but keeping them ordered)

For each batch, call neighborhoodGossip on it.

That not each item offered in the list gets accepted by each node should be fine (might have to tune amount of nodes, etc.). After all, that is why there is the accept/offer system.
This is how I would start with this, and it I think it will work decent enough.
Next, this could be improved probably: https://github.com/status-im/nimbus-eth1/blob/master/fluffy/network/wire/portal_protocol.nim#L1088
E.g select rather the middle of the id range of all the content items.

Then it offers, per node, all content it is potentially interested in, in batches of 64.

Yep, that it the case. As I see it, node will accept all content until at least its radius changes and new items will fall outside of its radius.

It will not seed all content of the SeedDb into the network, especially considering it only targets the own node's direct neighbours. For this to work, one would need to run this on a node in each "neighborhood"

I would consider version with looking only for closest nodes as starter. It is definitely possible lookup some nodes which are further away (like maybe uint256.high/2 , uint256.high/4 .. away from local id) and try to do same for them.

It again does not test / reuse code of the actual neighbourhood gossip mechanism. (Same remark as previously added json rpc call)

I agree this a bit of a problem, but ultimately this approach is a bit more efficient (were effiecenty is defined as amount of content offered and accepted) and imo there is nothing wrong having more efficient approach at entry point, so at least some node will receive as much content as possible.

The way I had in mind of implementing this is as follows:

By means of getContentInRange select content in batches of 64 and each time increase the target id until you have covered the whole id space. (Might have to tune the radius here, or have some way of selecting them but keeping them ordered)
For each batch, call neighborhoodGossip on it.

I have implemented locally something similar to this, but then problem is that performance of it was pretty abysmal i.e a lot of offered content was not accepted by peers. It maybe because of : https://github.com/status-im/nimbus-eth1/blob/master/fluffy/network/wire/portal_protocol.nim#L1088, or maybe I had some bug, or maybe this tuning of parameters was not perfect. Also xor distance does not make this easy as it is not monotonic, so if two id are close third id they are nor necessary close to each other. So it would warrant bigger investigation how to do it efficently.
At this point, I am thinking that current getConentInRange is maybe not perfect for this use case, and we would need something more sophisticated like maybe minimize distance to a set of node ids or something similar.

so tldr: imo it is easier to make this version to propagate this content to more nodes, version you mention would require more investigation and maybe more advanced queries to make it efficient. (unless of course I am missing something)

I agree this a bit of a problem, but ultimately this approach is a bit more efficient (were effiecenty is defined as amount of content offered and accepted)

Ultimately, the goal is to be able to seed this data from the least amount of nodes possible (e.g. currently 1 node is needed only), to make this actually usable. It doesn't matter if it is more efficient, if you can't actual use it because you need a lot of nodes to seed it from (lot of node ids at least). Efficiency is also to be proven. Sure, you are seeing a lot of acceptance at the first nodes. But what about them gossiping this further? Especially considering same (partial) data will be seeded again for different nodes.

and imo there is nothing wrong having more efficient approach at entry point, so at least some node will receive as much content as possible.

I disagree here, there is no point in adding code if it isn't clear how to use it. I'd rather have a usable but less efficient version first, which then can be improved. It basically should just be able to seed faster then the current version which only takes 1 item.
How did you test this by the way? If only with a small data set, it would probably not do well, for sure.

Also xor distance does not make this easy as it is not monotonic, so if two id are close third id they are nor necessary close to each other.

Sure, but there are "localities" which you can see with longer common prefixes.

I would consider version with looking only for closest nodes as starter. It is definitely possible lookup some nodes which are further away (like maybe uint256.high/2 , uint256.high/4 .. away from local id) and try to do same for them.

You'd have to get nodes for all the distances, and it still would have issues:

Your own node would be doing all the work, instead of leveraging the gossip mechanism of other nodes

You might not reach the full content id range, considering for high distances you'd get to few nodes that won't cover that part of the id space unless their radius is very high

You could of course go over the full id space, and do node lookups for those ranges, and then offer them their content. Doesn't immediately sound more efficient.

Ultimately, the goal is to be able to seed this data from the least amount of nodes possible (e.g. currently 1 node is needed only), to make this actually usable. It doesn't matter if it is more efficient, if you can't actual use it because you need a lot of nodes to seed it from (lot of node ids at least). Efficiency is also to be proven. Sure, you are seeing a lot of acceptance at the first nodes. But what about them gossiping this further? Especially considering same (partial) data will be seeded again for different nodes.

Hmm I feel like we are have different criteria for this whole story. My goal was to use seedDB and pump as much data as possible to some nodes, in most efficient manner, so with as biggest batches as possible with content which definitely fit theirs database radius. While what you are aiming at is to improvement of standard gossip to work with batches of data while having acceptable amount of effiancy i.e sometimes we will offer content outside of other
node range. Correct me if this is wrong assessment.

You'd have to get nodes for all the distances, and it still would have issues:

Your own node would be doing all the work, instead of leveraging the gossip mechanism of other nodes
You might not reach the full content id range, considering for high distances you'd get to few nodes that won't cover that part of the id space unless their radius is very high
You could of course go over the full id space, and do node lookups for those ranges, and then offer them their content. Doesn't immediately sound more efficient.

Again different goals. If someone is willing to use SeedDb for something it immediately assumes he will be doing a bit more work. So adding additional bit of work to cover as much of id space as possible is not a big deal (imo). One could even imagine, that seeder node will do periodic random node lookups and pumps as much data as possible to those nodes. And there is not more efficient way of transferring large amounts of data in current system that offers with batches of 64 items which all fit into remote node radius.

Sure, but there are "localities" which you can see with longer common prefixes.
hmm could you elaborate on that ? Out contentId are basically random number so I am not sure what localities you mean.

Imo working on improving neighborhoodGossip is separate task which would require determining how to optimally chose nodes to forward content to so that theirs ranges will overlap as much as possible and whatever is the order of received content.

While what you are aiming at is to improvement of standard gossip to work with batches of data while having acceptable amount of effiancy i.e sometimes we will offer content outside of other
node range. Correct me if this is wrong assessment.

What I am aiming at is a convenient yet faster way, than the current way, to seed the network with data, e.g. block 1 till block 100000, and be fairly certain that most (all?) of the data gets stored and is available on the network (this of course assumes enough nodes with the right radius). That is ultimately the goal of this task. How this gets achieved, is open for discussion, but the current proposal here doesn't do that in its current form.

One could even imagine, that seeder node will do periodic random node lookups and pumps as much data as possible to those nodes.

The more sensible way here imo is that a seeder node would not do node lookups but instead content lookups (data sampling), and then seed specific data (or data ranges) that is missing to the closest nodes (the way neighborhood gossip works): basically the random data version of this #1088.

The more sensible way here imo is that a seeder node would not do node lookups but instead content lookups (data sampling), and then seed specific data (or data ranges) that is missing to the closest nodes (the way neighborhood gossip works): basically the random data version of this #1088.

So for finding missing content and making sure it is available your approach is definitely better, but again just for seeding data into the network imo offering as much content as possible in node range is more efficient.

Nevertheless as per our discussion on discord:

I will decouple this part of code from the rest of codebase so we.

Implement initial versions of both approaches.

Then we will be able to test those on our fleet network with more nodes, to see how those fare.

Move all seeding logic into separate module. Add api which uses neighbourGossip to seed network

kdeme · 2022-07-25T13:01:49Z

fluffy/network/wire/portal_protocol.nim

@@ -1071,6 +1071,17 @@ proc queryRandom*(p: PortalProtocol): Future[seq[Node]] =
 ## Perform a query for a random target, return all nodes discovered.
 p.query(NodeId.random(p.baseProtocol.rng[]))

+proc getNClosestNodesWithRadius*(p: PortalProtocol, n: uint32): seq[(Node, UInt256)] =


As long as helper calls are not used in portal_protocol, I wouldn't put them there.

It can be used in neighbour gossip after some generalisation. In general I would, like to avoid exposing RadiusCache from PortalProtocol, and prefer this call than adding proc getNodeRadius(n: NodeId): Some[Uint256]) as this one implies that there is some kind of separate storage for radiuses.

At the time of my comment I didn't realize that there was no access to the cache, so I guess it is fine for now.

one implies that there is some kind of separate storage for radiuses

Well, this is kind of the case, no?

yep it is, but we this is current implementation detail which we do not need to expose from PortalProtocol at this moment.

kdeme · 2022-07-25T13:02:10Z

fluffy/populate_db.nim

@@ -13,6 +13,7 @@ import
 eth/[rlp, common/eth_types],
 # TODO: `NetworkId` should not be in these private types
 eth/p2p/private/p2p_types,
+ eth/p2p/discoveryv5/node,


Is this still needed?

kdeme · 2022-07-25T13:26:08Z

fluffy/scripts/test_portal_testnet.nim

@@ -66,12 +66,15 @@ proc withRetries[A](

 if check(res):
 return res
+ else:
+ raise newException(ValueError, "check failed")


This does look a bit ugly, perhaps do the try/except around let res = await f() only and dont allow check to raise.

I agree that it is a bit ugly although it most cleanly express the fact that we want to increase number of tries in both cases (exception and check fail) and that in case of exception we want to propagate it to the call site and in case of check fail we want to have information that that it is check failed not exception.

One thing that I will improve is to have custom error message per call as check failed does not tell much

kdeme · 2022-07-25T13:26:38Z

fluffy/scripts/test_portal_testnet.nim

+ await client.close()
+ nodeInfos.add(nodeInfo)
+
+ # different set of data for each test as tests are statefull so previously propageted


Suggested change

# different set of data for each test as tests are statefull so previously propageted

# different set of data for each test as tests are statefull so previously propagated

kdeme · 2022-07-25T13:31:25Z

fluffy/network/network_seed.nim

+ # TODO this a bit hacky way to make sure we will engage different peers for
+ # each batch of data. This maybe removed after improving neighborhoodGossip
+ # to better chose peers based on propagated content


This comment is invalid I believe. You will engage different peers anyhow, as it is data for a different range.
What it might do is hit more valid peers.

I was alluding to the fact that for each batch we only find peers close to first item, which can miss some peers interested in later items. But sure I will improve the wording here a bit

kdeme · 2022-07-25T13:33:10Z

fluffy/network/network_seed.nim

+# Modulue is oblivious to content stored in seed database as all content related
+# parameters should be avaialble in seed db i.e (contentId, contentKey, content)
+# One thing which could need to parameterized per network basis in the future is
+# distance function.
+# TODO At this point all calls are one shot calls but we can also experiment with
+# approaches which start some process which contiously seeds data


Suggested change

# Modulue is oblivious to content stored in seed database as all content related

# parameters should be avaialble in seed db i.e (contentId, contentKey, content)

# One thing which could need to parameterized per network basis in the future is

# distance function.

# TODO At this point all calls are one shot calls but we can also experiment with

# approaches which start some process which contiously seeds data

# Module is oblivious to content stored in seed database as all content related

# parameters should be available in seed db i.e (contentId, contentKey, content)

# One thing which might need to be parameterized per network basis in the future is

# the distance function.

# TODO: At this point all calls are one shot calls but we can also experiment with

# approaches which start some process which continuously seeds data.

kdeme · 2022-07-25T15:46:20Z

fluffy/network/network_seed.nim

+ for n in closestWithRadius:
+ gossipWorkers.add(p.worker(db, n[0], n[1]))
+
+ gossipWorkers.add(p.localNodeWorker(db))


I still think that this is a weird and confusing thing to do.

sure, I can change it. From my pov it was only cosmetic thing so if it can bring confusion it is not worth it.

kdeme · 2022-07-25T16:49:53Z

fluffy/network/wire/portal_protocol.nim

- if radius.isSome():
- if p.inRange(node.id, radius.unsafeGet(), contentId):
- gossipNodes.add(node)
+ let gossipNodes = p.getNClosestNodesWithRadius(


I think you missed the p.inRange check here.

kdeme · 2022-07-25T16:51:15Z

fluffy/network/wire/portal_protocol.nim


 if gossipNodes.len >= 8: # use local nodes for gossip
 portal_gossip_without_lookup.inc(labelValues = [$p.protocolId])
 for node in gossipNodes[0..<min(gossipNodes.len, maxGossipNodes)]:
- let req = OfferRequest(dst: node, kind: Direct, contentList: contentList)
+ let req = OfferRequest(dst: node[0], kind: Direct, contentList: contentList)


This [0] makes it less clear, I probably would not use that call that returns both radius and the nodes, unless you can make it fit better the use case.

sure, getNClosestNodesWithRadius could return some kind of object or named tuple then, but imo it is too much ceremony for such call then. I will bring it to how it was here.

kdeme · 2022-07-25T16:51:43Z

fluffy/scripts/test_portal_testnet.nim

@@ -18,7 +18,7 @@ import
 type
 FutureCallback[A] = proc (): Future[A] {.gcsafe, raises: [Defect].}

- CheckCallback[A] = proc (a: A): bool {.gcsafe, raises: [Defect].}
+ CheckCallback[A] = proc (a: A): bool {.gcsafe, raises: [].}


hmm, why do you need to remove Defect?

I thought that is what you meant by dont allow check to raise. i.e that raises: [] means that callback cannot raise any exceptions. If I misunderstood I will bring it back.

Ah, no. Sorry for the confusion, I was just trying to say say that if check cannot raise, you can move the try/except to only the await f() call. I didn't actually check if it could raise, so wasn't implying to fix that, my bad.

kdeme · 2022-07-25T16:53:47Z

fluffy/network/wire/portal_protocol.nim

@@ -1071,6 +1071,21 @@ proc queryRandom*(p: PortalProtocol): Future[seq[Node]] =
 ## Perform a query for a random target, return all nodes discovered.
 p.query(NodeId.random(p.baseProtocol.rng[]))

+proc getNClosestNodesWithRadius*(
+ p: PortalProtocol,
+ nodeId: NodeId,


Perhaps call this targetId

Add bulk seeding to multiple peers

5a34f86

KonradStaniec requested a review from kdeme July 21, 2022 13:07

kdeme reviewed Jul 21, 2022

View reviewed changes

KonradStaniec added 2 commits July 22, 2022 13:37

Increase local worker batch size

0c6738b

Seeding improvements

d97caf4

Move all seeding logic into separate module. Add api which uses neighbourGossip to seed network

KonradStaniec force-pushed the bulk-propagation-seed branch from 0dc6ab2 to d97caf4 Compare July 25, 2022 11:10

kdeme reviewed Jul 25, 2022

View reviewed changes

Pr comments

089e55b

KonradStaniec force-pushed the bulk-propagation-seed branch from 1dad435 to 089e55b Compare July 25, 2022 17:24

KonradStaniec mentioned this pull request Jul 26, 2022

Improve reliability of portal testnet tests #1166

Closed

KonradStaniec merged commit e3cabaf into master Jul 26, 2022

KonradStaniec deleted the bulk-propagation-seed branch July 26, 2022 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulk seeding to multiple peers #1170

Add bulk seeding to multiple peers #1170

KonradStaniec commented Jul 21, 2022

kdeme Jul 21, 2022

KonradStaniec Jul 22, 2022 •

edited

Loading

kdeme Jul 21, 2022

KonradStaniec Jul 22, 2022

kdeme Jul 21, 2022

KonradStaniec Jul 22, 2022 •

edited

Loading

kdeme Jul 22, 2022 •

edited

Loading

KonradStaniec Jul 22, 2022

kdeme Jul 22, 2022 •

edited

Loading

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022 •

edited

Loading

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022 •

edited

Loading

kdeme Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022

kdeme Jul 25, 2022

KonradStaniec Jul 25, 2022 •

edited

Loading

kdeme Jul 26, 2022

kdeme Jul 25, 2022


		offset = offset + batchSize

		proc localNodeWorker(p: PortalProtocol, db: SeedDb): Future[void] {.async.}=

	# different set of data for each test as tests are statefull so previously propageted
	# different set of data for each test as tests are statefull so previously propagated

Add bulk seeding to multiple peers #1170

Add bulk seeding to multiple peers #1170

Conversation

KonradStaniec commented Jul 21, 2022

Choose a reason for hiding this comment

KonradStaniec Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KonradStaniec Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

kdeme Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdeme Jul 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdeme Jul 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KonradStaniec Jul 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KonradStaniec Jul 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KonradStaniec Jul 22, 2022 •

edited

Loading

KonradStaniec Jul 22, 2022 •

edited

Loading

kdeme Jul 22, 2022 •

edited

Loading

kdeme Jul 22, 2022 •

edited

Loading

kdeme Jul 25, 2022 •

edited

Loading

KonradStaniec Jul 25, 2022 •

edited

Loading

KonradStaniec Jul 25, 2022 •

edited

Loading