Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk seeding to multiple peers #1170

Merged
merged 4 commits into from
Jul 26, 2022
Merged

Conversation

KonradStaniec
Copy link
Contributor

Add call to seed multiple content items to multiple peers, effectively it tries as much content as possible from seed db as fits in remote peer radius.

We can consider making it asynchornous i.e installing seed task in portal protocol and exposing some endpoints to check status task as it make take a while in case of large database. Also in such case this can be made a bit more sophisticated as node will discover new peers which could be added to set of seeded peers.

@KonradStaniec KonradStaniec requested a review from kdeme July 21, 2022 13:07

offset = offset + batchSize

proc localNodeWorker(p: PortalProtocol, db: SeedDb): Future[void] {.async.}=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an async call?

Copy link
Contributor Author

@KonradStaniec KonradStaniec Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably a bit silly, but imo it looks nicer and consistent that way i.e local worker is added to set of gossip workers and awaited on

proc localNodeWorker(p: PortalProtocol, db: SeedDb): Future[void] {.async.}=
var offset = 0
while true:
let content = db.getContentInRange(p.localNode.id, p.dataRadius, batchSize, offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that this doesn't go over the network, you can just set a much bigger batchSize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep you are are right the only limit is how many results will fit in memory. I will fix it 👍

p: PortalProtocol, seedDbPath: string, maxClosestNodes: uint32):
Future[Result[void, string]] {.async.} =

## Choses `maxClosestNodes` closest known nodes with known radius and tries to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what this does is select a a set of nodes, that is, the closest neighbours of the node, of which it also holds the radius.
Then it offers, per node, all content it is potentially interested in, in batches of 64. Correct?
Downsides I see with this approach:

  • It again does not test / reuse code of the actual neighborhood gossip mechanism. (Same remark as previously added json rpc call)
  • It will not seed all content of the SeedDb into the network, especially considering it only targets the own node's direct neighbours. For this to work, one would need to run this on a node in each "neighborhood" (it will be difficult to define how big this is). That does not sound feasible.

It is important when adding this functionality to think about how it can be used to seed all data into the network.

The way I had in mind of implementing this is as follows:

  • By means of getContentInRange select content in batches of 64 and each time increase the target id until you have covered the whole id space. (Might have to tune the radius here, or have some way of selecting them but keeping them ordered)
  • For each batch, call neighborhoodGossip on it.

That not each item offered in the list gets accepted by each node should be fine (might have to tune amount of nodes, etc.). After all, that is why there is the accept/offer system.
This is how I would start with this, and it I think it will work decent enough.
Next, this could be improved probably: https://github.com/status-im/nimbus-eth1/blob/master/fluffy/network/wire/portal_protocol.nim#L1088
E.g select rather the middle of the id range of all the content items.

Copy link
Contributor Author

@KonradStaniec KonradStaniec Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it offers, per node, all content it is potentially interested in, in batches of 64.

Yep, that it the case. As I see it, node will accept all content until at least its radius changes and new items will fall outside of its radius.

It will not seed all content of the SeedDb into the network, especially considering it only targets the own node's direct neighbours. For this to work, one would need to run this on a node in each "neighborhood"

I would consider version with looking only for closest nodes as starter. It is definitely possible lookup some nodes which are further away (like maybe uint256.high/2 , uint256.high/4 .. away from local id) and try to do same for them.

It again does not test / reuse code of the actual neighbourhood gossip mechanism. (Same remark as previously added json rpc call)

I agree this a bit of a problem, but ultimately this approach is a bit more efficient (were effiecenty is defined as amount of content offered and accepted) and imo there is nothing wrong having more efficient approach at entry point, so at least some node will receive as much content as possible.

The way I had in mind of implementing this is as follows:

By means of getContentInRange select content in batches of 64 and each time increase the target id until you have covered the whole id space. (Might have to tune the radius here, or have some way of selecting them but keeping them ordered)
For each batch, call neighborhoodGossip on it.

I have implemented locally something similar to this, but then problem is that performance of it was pretty abysmal i.e a lot of offered content was not accepted by peers. It maybe because of : https://github.com/status-im/nimbus-eth1/blob/master/fluffy/network/wire/portal_protocol.nim#L1088, or maybe I had some bug, or maybe this tuning of parameters was not perfect. Also xor distance does not make this easy as it is not monotonic, so if two id are close third id they are nor necessary close to each other. So it would warrant bigger investigation how to do it efficently.
At this point, I am thinking that current getConentInRange is maybe not perfect for this use case, and we would need something more sophisticated like maybe minimize distance to a set of node ids or something similar.

so tldr: imo it is easier to make this version to propagate this content to more nodes, version you mention would require more investigation and maybe more advanced queries to make it efficient. (unless of course I am missing something)

Copy link
Contributor

@kdeme kdeme Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this a bit of a problem, but ultimately this approach is a bit more efficient (were effiecenty is defined as amount of content offered and accepted)

Ultimately, the goal is to be able to seed this data from the least amount of nodes possible (e.g. currently 1 node is needed only), to make this actually usable. It doesn't matter if it is more efficient, if you can't actual use it because you need a lot of nodes to seed it from (lot of node ids at least). Efficiency is also to be proven. Sure, you are seeing a lot of acceptance at the first nodes. But what about them gossiping this further? Especially considering same (partial) data will be seeded again for different nodes.

and imo there is nothing wrong having more efficient approach at entry point, so at least some node will receive as much content as possible.

I disagree here, there is no point in adding code if it isn't clear how to use it. I'd rather have a usable but less efficient version first, which then can be improved. It basically should just be able to seed faster then the current version which only takes 1 item.
How did you test this by the way? If only with a small data set, it would probably not do well, for sure.

Also xor distance does not make this easy as it is not monotonic, so if two id are close third id they are nor necessary close to each other.

Sure, but there are "localities" which you can see with longer common prefixes.

I would consider version with looking only for closest nodes as starter. It is definitely possible lookup some nodes which are further away (like maybe uint256.high/2 , uint256.high/4 .. away from local id) and try to do same for them.

You'd have to get nodes for all the distances, and it still would have issues:

  • Your own node would be doing all the work, instead of leveraging the gossip mechanism of other nodes
  • You might not reach the full content id range, considering for high distances you'd get to few nodes that won't cover that part of the id space unless their radius is very high

You could of course go over the full id space, and do node lookups for those ranges, and then offer them their content. Doesn't immediately sound more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately, the goal is to be able to seed this data from the least amount of nodes possible (e.g. currently 1 node is needed only), to make this actually usable. It doesn't matter if it is more efficient, if you can't actual use it because you need a lot of nodes to seed it from (lot of node ids at least). Efficiency is also to be proven. Sure, you are seeing a lot of acceptance at the first nodes. But what about them gossiping this further? Especially considering same (partial) data will be seeded again for different nodes.

Hmm I feel like we are have different criteria for this whole story. My goal was to use seedDB and pump as much data as possible to some nodes, in most efficient manner, so with as biggest batches as possible with content which definitely fit theirs database radius. While what you are aiming at is to improvement of standard gossip to work with batches of data while having acceptable amount of effiancy i.e sometimes we will offer content outside of other
node range. Correct me if this is wrong assessment.

You'd have to get nodes for all the distances, and it still would have issues:

Your own node would be doing all the work, instead of leveraging the gossip mechanism of other nodes
You might not reach the full content id range, considering for high distances you'd get to few nodes that won't cover that part of the id space unless their radius is very high
You could of course go over the full id space, and do node lookups for those ranges, and then offer them their content. Doesn't immediately sound more efficient.

Again different goals. If someone is willing to use SeedDb for something it immediately assumes he will be doing a bit more work. So adding additional bit of work to cover as much of id space as possible is not a big deal (imo). One could even imagine, that seeder node will do periodic random node lookups and pumps as much data as possible to those nodes. And there is not more efficient way of transferring large amounts of data in current system that offers with batches of 64 items which all fit into remote node radius.

Sure, but there are "localities" which you can see with longer common prefixes.
hmm could you elaborate on that ? Out contentId are basically random number so I am not sure what localities you mean.

Imo working on improving neighborhoodGossip is separate task which would require determining how to optimally chose nodes to forward content to so that theirs ranges will overlap as much as possible and whatever is the order of received content.

Copy link
Contributor

@kdeme kdeme Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While what you are aiming at is to improvement of standard gossip to work with batches of data while having acceptable amount of effiancy i.e sometimes we will offer content outside of other
node range. Correct me if this is wrong assessment.

What I am aiming at is a convenient yet faster way, than the current way, to seed the network with data, e.g. block 1 till block 100000, and be fairly certain that most (all?) of the data gets stored and is available on the network (this of course assumes enough nodes with the right radius). That is ultimately the goal of this task. How this gets achieved, is open for discussion, but the current proposal here doesn't do that in its current form.

One could even imagine, that seeder node will do periodic random node lookups and pumps as much data as possible to those nodes.

The more sensible way here imo is that a seeder node would not do node lookups but instead content lookups (data sampling), and then seed specific data (or data ranges) that is missing to the closest nodes (the way neighborhood gossip works): basically the random data version of this #1088.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more sensible way here imo is that a seeder node would not do node lookups but instead content lookups (data sampling), and then seed specific data (or data ranges) that is missing to the closest nodes (the way neighborhood gossip works): basically the random data version of this #1088.

So for finding missing content and making sure it is available your approach is definitely better, but again just for seeding data into the network imo offering as much content as possible in node range is more efficient.

Nevertheless as per our discussion on discord:

  • I will decouple this part of code from the rest of codebase so we.
  • Implement initial versions of both approaches.
  • Then we will be able to test those on our fleet network with more nodes, to see how those fare.

Move all seeding logic into separate module.
Add api which uses neighbourGossip to seed network
@@ -1071,6 +1071,17 @@ proc queryRandom*(p: PortalProtocol): Future[seq[Node]] =
## Perform a query for a random target, return all nodes discovered.
p.query(NodeId.random(p.baseProtocol.rng[]))

proc getNClosestNodesWithRadius*(p: PortalProtocol, n: uint32): seq[(Node, UInt256)] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as helper calls are not used in portal_protocol, I wouldn't put them there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be used in neighbour gossip after some generalisation. In general I would, like to avoid exposing RadiusCache from PortalProtocol, and prefer this call than adding proc getNodeRadius(n: NodeId): Some[Uint256]) as this one implies that there is some kind of separate storage for radiuses.

Copy link
Contributor

@kdeme kdeme Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of my comment I didn't realize that there was no access to the cache, so I guess it is fine for now.

one implies that there is some kind of separate storage for radiuses

Well, this is kind of the case, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep it is, but we this is current implementation detail which we do not need to expose from PortalProtocol at this moment.

@@ -13,6 +13,7 @@ import
eth/[rlp, common/eth_types],
# TODO: `NetworkId` should not be in these private types
eth/p2p/private/p2p_types,
eth/p2p/discoveryv5/node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

@@ -66,12 +66,15 @@ proc withRetries[A](

if check(res):
return res
else:
raise newException(ValueError, "check failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does look a bit ugly, perhaps do the try/except around let res = await f() only and dont allow check to raise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is a bit ugly although it most cleanly express the fact that we want to increase number of tries in both cases (exception and check fail) and that in case of exception we want to propagate it to the call site and in case of check fail we want to have information that that it is check failed not exception.

One thing that I will improve is to have custom error message per call as check failed does not tell much

await client.close()
nodeInfos.add(nodeInfo)

# different set of data for each test as tests are statefull so previously propageted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# different set of data for each test as tests are statefull so previously propageted
# different set of data for each test as tests are statefull so previously propagated

Comment on lines 164 to 166
# TODO this a bit hacky way to make sure we will engage different peers for
# each batch of data. This maybe removed after improving neighborhoodGossip
# to better chose peers based on propagated content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is invalid I believe. You will engage different peers anyhow, as it is data for a different range.
What it might do is hit more valid peers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was alluding to the fact that for each batch we only find peers close to first item, which can miss some peers interested in later items. But sure I will improve the wording here a bit

Comment on lines 17 to 22
# Modulue is oblivious to content stored in seed database as all content related
# parameters should be avaialble in seed db i.e (contentId, contentKey, content)
# One thing which could need to parameterized per network basis in the future is
# distance function.
# TODO At this point all calls are one shot calls but we can also experiment with
# approaches which start some process which contiously seeds data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Modulue is oblivious to content stored in seed database as all content related
# parameters should be avaialble in seed db i.e (contentId, contentKey, content)
# One thing which could need to parameterized per network basis in the future is
# distance function.
# TODO At this point all calls are one shot calls but we can also experiment with
# approaches which start some process which contiously seeds data
# Module is oblivious to content stored in seed database as all content related
# parameters should be available in seed db i.e (contentId, contentKey, content)
# One thing which might need to be parameterized per network basis in the future is
# the distance function.
# TODO: At this point all calls are one shot calls but we can also experiment with
# approaches which start some process which continuously seeds data.

for n in closestWithRadius:
gossipWorkers.add(p.worker(db, n[0], n[1]))

gossipWorkers.add(p.localNodeWorker(db))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this is a weird and confusing thing to do.

Copy link
Contributor Author

@KonradStaniec KonradStaniec Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I can change it. From my pov it was only cosmetic thing so if it can bring confusion it is not worth it.

if radius.isSome():
if p.inRange(node.id, radius.unsafeGet(), contentId):
gossipNodes.add(node)
let gossipNodes = p.getNClosestNodesWithRadius(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you missed the p.inRange check here.


if gossipNodes.len >= 8: # use local nodes for gossip
portal_gossip_without_lookup.inc(labelValues = [$p.protocolId])
for node in gossipNodes[0..<min(gossipNodes.len, maxGossipNodes)]:
let req = OfferRequest(dst: node, kind: Direct, contentList: contentList)
let req = OfferRequest(dst: node[0], kind: Direct, contentList: contentList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This [0] makes it less clear, I probably would not use that call that returns both radius and the nodes, unless you can make it fit better the use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, getNClosestNodesWithRadius could return some kind of object or named tuple then, but imo it is too much ceremony for such call then. I will bring it to how it was here.

@@ -18,7 +18,7 @@ import
type
FutureCallback[A] = proc (): Future[A] {.gcsafe, raises: [Defect].}

CheckCallback[A] = proc (a: A): bool {.gcsafe, raises: [Defect].}
CheckCallback[A] = proc (a: A): bool {.gcsafe, raises: [].}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, why do you need to remove Defect?

Copy link
Contributor Author

@KonradStaniec KonradStaniec Jul 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that is what you meant by dont allow check to raise. i.e that raises: [] means that callback cannot raise any exceptions. If I misunderstood I will bring it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no. Sorry for the confusion, I was just trying to say say that if check cannot raise, you can move the try/except to only the await f() call. I didn't actually check if it could raise, so wasn't implying to fix that, my bad.

@@ -1071,6 +1071,21 @@ proc queryRandom*(p: PortalProtocol): Future[seq[Node]] =
## Perform a query for a random target, return all nodes discovered.
p.query(NodeId.random(p.baseProtocol.rng[]))

proc getNClosestNodesWithRadius*(
p: PortalProtocol,
nodeId: NodeId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps call this targetId

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants