-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ord server not distributing work to threads correctly #2292
Comments
This is what I see from each piece of code. First first one:
The second one:
|
Interesting. Dug a little into how concurrency is being handled by Tokio and by the Rust standard library's thread::sleep function. When you use thread::sleep, you're pausing the entire operating system thread for a specified amount of time, so when we use thread::sleep inside a Tokio async block, you're essentially blocking the Tokio executor thread. This is why you're seeing a sequential handling of requests: after the first request arrives, it puts the executor to sleep for 1 second, during which time no other tasks can be processed. The reason the code seems to work better without tokio::spawn is because you're not actually running the server in a separate task. The server runs directly on the main Tokio runtime thread, and each request is handled in a new task. Because these tasks are not blocking the runtime thread, multiple requests can be processed concurrently. So I assume the issue in the actual ord server is that we have some OS calls that block the Tokio executor thread and we'd have to make them async... Need to look into that more deeply. |
Thanks for looking into it.
If I hit my sleep endpoint 20 times very quickly, I get 1 response after 1 second, then 8 responses a second later, then 8 more another second later, and the last one a second later. I have 8 cpu cores. It appears that it is able to process 8 requests concurrently but only if they were already waiting to be processed when the current request finishes. Does that match your understanding? Because it doesn't sound like it matches the line I quoted above. (See in the timing output of the first code snippet #2292 (comment) I get one 1s timing and three 2s timings - the last 3 requests are processed concurrently) |
I'm experiencing a similar issue. I have a node.js server sending batches of requests to my order server and the requests are sent in a non-blocking way, but are responded to sequentially, each in an accumulating larger amount of time, e.g. req 1: 10ms, req 2: 20ms, ... req 150: 1500ms... I don't have the best metrics to provide you but I've tested my code extensively and have found this behavior only exhibits itself when making requests to ord. Also, I am using the most recent version of the ord client (on GH) as of today, with an --index-sats and --index-runes index file. This issue has coincided with pulling recent commits within the last few days. To help you reproduce the issue, just try requesting the content of each of the first 100 inscriptions in block 834061. |
Are you able to bisect it down to a specific commit that caused this regression? |
Unfortunately not. My FT job is pretty demanding right now, also the aforementioned commits I pulled required I update my version of my index and re-index everything so somewhere in that span of commits is a paradigm shift I really would prefer not to cross over again (the v18 index too a lot longer than my previous one). Sorry I can't be of more help. |
No worries! Would you willing to share the node.js server code or benchmarking script you used? Then I could use that as a starting point for debugging. |
Gladly! Hope this helps: I chose a block number where it's especially pronounced. If page 1 isn't demonstrating the issue i recommend temporarily changing your page size to like 1000 as opposed to 100. const axios = require('axios');
const ordExplorerUrl = "<FILL ME IN>"
class OrdExplorer {
async get(path) {
console.log(`${path} started`);
console.time(`${path} took`);
const { data } = await axios.get(`http:https://${ordExplorerUrl}:81/${path}`, {
headers: {
"Accept": "application/json"
}
});
console.timeEnd(`${path} took`);
return data
}
}
const ordExplorer = new OrdExplorer();
const currentBlock = 825951;
async function main() {
console.log("searching block", currentBlock)
try {
let page = await ordExplorer.get(`inscriptions/block/${currentBlock}/1`);
let ids = page.ids.filter(id => !!id);
await Promise.all(ids.map(async (inscriptionId) => {
let content = null;
let errors = 0;
while (content === null) {
try {
content = await ordExplorer.get(`content/${inscriptionId}`);
} catch (err) {
if (err.message.includes("connect EADDRNOTAVAIL")) {
console.log("server overloaded, waiting...")
await wait(5000);
errors++;
} else {
console.log(`CONTENT FAILED ${inscriptionId}: ${err.message}`);
return
}
}
}
}));
} catch (error) {
console.log(error);
process.exit(1);
}
process.exit(0)
}
main(); Hopefully this recreates the same behavior for you. You'll notice some slow requests block all other requests while they're being handled. The requests are sent in a non-blocking way (which you can also see in the logs) but appear to be responded to sequentially. Please let me know if I'm misunderstanding the behavior or if you have any suggestions. Thank you! |
I've noticed that the ord web server doesn't distribute work very evenly between CPU cores. When I make a lot of requests to it at the same time, only one of the requests gets processed immediately and the rest have to wait until the first request is finished.
To demonstrate the effect, I made a bare-bones version of the ord server:
Run it, and hit / 4 times very quickly. The first request will come back after 1 second but the other 3 usually take 2 seconds.
Then modify the code to the following and try again:
Now all 4 requests return after 1 second.
It appears to be the
tokio::spawn()
call that is messing things up, but I have no idea why.The text was updated successfully, but these errors were encountered: