⚡ Several optimizations for improving the performance of the engine #486

neon-mmd · 2024-01-12T11:40:58Z

What would you like to share?

Work Expected From This Issue

Provide several optimizations to improve the performance of the search engine by making the following changes:

Replace HashMaps for Vector of tuples for fetching, filtering and aggregating of search results. By making this change this will bring the time to process search results from the upstream search engines from O(n^3) to O(n^2). As vectors do not require hashing of search results which tends to be O(n).

For more information on hashing can be an O(n) time complexity algorithm. See:

https://crypto.stackexchange.com/questions/67448/what-is-the-time-complexity-of-computing-a-cryptographic-hash-function-random-or

Also, by working on this change, this will make sorting of search results according to relevancy in the search page possible. As vectors does not sort search results internally according to a hash and hence tends to be more predictable programmatically.

Use Redis pipelining and mini-mocha sync methods to cache multiple pages at once thus reducing the latency and time taken to cache the search results from different pages. Also, cache the results once after all the search results of different pages have been aggregated in search function and then cache the results parallely using a separate non-blocking thread via tokio:spawn.

This changes has been proposed in detail in the issue #444.

Initialize Config and SharedCache struct globally using static variables and std::sync::Oncelock and pass them by static reference.
Reduce the amount of clones, to_owned and to_string conversions in the Codebase.
Use Arc cloning to partially clone data between tokio:spawn threads in the aggregate function in the src/results/aggregator.rs file under the codebase.
Optimize the filter function in the src/results/aggregator.rs file under the Codebase to take time complexity of O(1) using while loop with variables handling the indices for the data structure used. And using the swap_remove function of vector to remove elements from the data structure used in the function.
Use branchless coding style to reduce code branching.

For more information on how reducing branches can improve performance. See:
https://codinginterviewsmadesimple.substack.com/p/understanding-branchless-programming

Use FuturesUnordered from futures crate to fetch the results from the upstream engines in an unordered fashion in the src/results/aggregator.rs file. Which will not require each request to wait in order to complete the previous actions, which improves the speed of fetching and aggregating results.
Use asynchronous crates for compression and use asynchronous tokio::io and tokio::fs asynchronous methods over synchronous std::io and std::fs methods in asynchronous code and make functions asynchronous to improve performance.

Reasoning Behind The Proposed Changes

The reasoning behind the following changes is to improve the performance of the engine which can reduce the time it takes to display the search results, which can drastically improve the user experience.

Do you want to work on this issue?

None

Additional information

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2024-01-12T11:41:08Z

To reduce notifications, issues are locked until they are 🏁 status: ready for dev and to be assigned. You can learn more in our contributing guide https://github.com/neon-mmd/websurfx/blob/rolling/CONTRIBUTING.md

github-actions · 2024-01-15T06:14:30Z

The issue has been unlocked and is now ready for dev. If you would like to work on this issue, you can comment to have it assigned to you. You can learn more in our contributing guide https://github.com/neon-mmd/websurfx/blob/rolling/CONTRIBUTING.md

…ant (#486) - initializes & stores the config & cache structs as a static constant. - Pass the config & cache structs as a static reference to all the functions handling their respective route.

…lts (#486) - replace hashmaps with vectors for fetching, collecting & aggregating results as it tends to be contigous & cache efficient data structure. - refactor & redesign algorithms for fetching & aggregating results centered around vectors in aggregate function.

… tokio spawn tasks (#486) - using the `futureunordered` instead of vector for collecting results reduces the time it takes to fetch the results as the results do not need to come in specific order so any result that gets fetched first gets collected in the `futureunordered` type. Co-authored-by: Spencerjibz <[email protected]>

…tasks (#486)

…f 3 (#486)

…es (#486)

…on (#486)

…s filtering performance (#486)

…gine (#540) * ♻️ refactor: initialize & store the config & cache structs as a constant (#486) - initializes & stores the config & cache structs as a static constant. - Pass the config & cache structs as a static reference to all the functions handling their respective route. * ⚡ perf: replace hashmaps with vectors for fetching & aggregating results (#486) - replace hashmaps with vectors for fetching, collecting & aggregating results as it tends to be contigous & cache efficient data structure. - refactor & redesign algorithms for fetching & aggregating results centered around vectors in aggregate function. * ➕ build: add the future crate (#486) * ⚡ perf: use `futureunordered` for collecting results fetched from the tokio spawn tasks (#486) - using the `futureunordered` instead of vector for collecting results reduces the time it takes to fetch the results as the results do not need to come in specific order so any result that gets fetched first gets collected in the `futureunordered` type. Co-authored-by: Spencerjibz <[email protected]> * ⚡ perf: initialize new async connections parallely using tokio spawn tasks (#486) * ⚡ perf: initialize redis pipeline struct once with the default size of 3 (#486) * ⚡ perf: reduce branch predictions by reducing conditional code branches (#486) * ✅ test(unit): provide unit test for the `get_safesearch_level` function (#486) * ⚡ perf: reduce clones & use index based loop to improve search results filtering performance (#486) * 🚨 fix(clippy): make clippy/format checks happy (#486) * 🚨 fix(build): make the cargo build check happy (#486) * ⚡ perf: reduce the amount of clones, to_owneds & to_strings (#486) * ⚡ perf: use async crates & methods & make functions async (#486) * 🔖 chore(release): bump the app version (#486) --------- Co-authored-by: Spencerjibz <[email protected]>

neon-mmd added the 🚦 status: awaiting triage label Jan 12, 2024

github-actions bot locked and limited conversation to collaborators Jan 12, 2024

neon-mmd self-assigned this Jan 15, 2024

neon-mmd added 🟧 priority: high 💻 aspect: code ✨ goal: improvement 🔢 points: 5 🏁 status: ready for dev and removed 🚦 status: awaiting triage labels Jan 15, 2024

github-actions bot unlocked this conversation Jan 15, 2024

neon-mmd added a commit that referenced this issue Mar 6, 2024

➕ build: add the future crate (#486)

eceaf15

neon-mmd added a commit that referenced this issue Mar 6, 2024

⚡ perf: initialize new async connections parallely using tokio spawn …

d78fd75

…tasks (#486)

neon-mmd added a commit that referenced this issue Mar 6, 2024

⚡ perf: initialize redis pipeline struct once with the default size o…

5f0edde

…f 3 (#486)

neon-mmd added a commit that referenced this issue Mar 6, 2024

⚡ perf: reduce branch predictions by reducing conditional code branch…

c838636

…es (#486)

neon-mmd added a commit that referenced this issue Mar 6, 2024

✅ test(unit): provide unit test for the get_safesearch_level functi…

49f4690

…on (#486)

neon-mmd added a commit that referenced this issue Mar 6, 2024

⚡ perf: reduce clones & use index based loop to improve search result…

6b023f3

…s filtering performance (#486)

neon-mmd added a commit that referenced this issue Mar 6, 2024

🚨 fix(clippy): make clippy/format checks happy (#486)

7a92f22

neon-mmd mentioned this issue Mar 6, 2024

⚡ Several optimizations for improving the performance of the engine #540

Merged

9 tasks

neon-mmd added a commit that referenced this issue Mar 6, 2024

🚨 fix(clippy): make clippy/format checks happy (#486)

6d00bb7

neon-mmd added a commit that referenced this issue Mar 7, 2024

🚨 fix(build): make the cargo build check happy (#486)

7d92914

neon-mmd added a commit that referenced this issue Mar 7, 2024

🚨 fix(build): make the cargo build check happy (#486)

0adbaec

neon-mmd added a commit that referenced this issue Mar 8, 2024

⚡ perf: reduce the amount of clones, to_owneds & to_strings (#486)

6aa9992

neon-mmd added a commit that referenced this issue Mar 10, 2024

⚡ perf: use async crates & methods & make functions async (#486)

55dbd06

neon-mmd added a commit that referenced this issue Mar 10, 2024

🔖 chore(release): bump the app version (#486)

e385a57

neon-mmd closed this as completed in #540 Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Several optimizations for improving the performance of the engine #486

⚡ Several optimizations for improving the performance of the engine #486

neon-mmd commented Jan 12, 2024 •

edited

Loading

github-actions bot commented Jan 12, 2024

github-actions bot commented Jan 15, 2024

⚡ Several optimizations for improving the performance of the engine #486

⚡ Several optimizations for improving the performance of the engine #486

Comments

neon-mmd commented Jan 12, 2024 • edited Loading

What would you like to share?

Work Expected From This Issue

Reasoning Behind The Proposed Changes

Do you want to work on this issue?

Additional information

github-actions bot commented Jan 12, 2024

github-actions bot commented Jan 15, 2024

neon-mmd commented Jan 12, 2024 •

edited

Loading