Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡ Several optimizations for improving the performance of the engine #486

Closed
neon-mmd opened this issue Jan 12, 2024 · 2 comments · Fixed by #540
Closed

⚡ Several optimizations for improving the performance of the engine #486

neon-mmd opened this issue Jan 12, 2024 · 2 comments · Fixed by #540

Comments

@neon-mmd
Copy link
Owner

neon-mmd commented Jan 12, 2024

What would you like to share?

Work Expected From This Issue

Provide several optimizations to improve the performance of the search engine by making the following changes:

  • Replace HashMaps for Vector of tuples for fetching, filtering and aggregating of search results. By making this change this will bring the time to process search results from the upstream search engines from O(n^3) to O(n^2). As vectors do not require hashing of search results which tends to be O(n).

For more information on hashing can be an O(n) time complexity algorithm. See:

https://crypto.stackexchange.com/questions/67448/what-is-the-time-complexity-of-computing-a-cryptographic-hash-function-random-or

Also, by working on this change, this will make sorting of search results according to relevancy in the search page possible. As vectors does not sort search results internally according to a hash and hence tends to be more predictable programmatically.

  • Use Redis pipelining and mini-mocha sync methods to cache multiple pages at once thus reducing the latency and time taken to cache the search results from different pages. Also, cache the results once after all the search results of different pages have been aggregated in search function and then cache the results parallely using a separate non-blocking thread via tokio:spawn.

This changes has been proposed in detail in the issue #444.

  • Initialize Config and SharedCache struct globally using static variables and std::sync::Oncelock and pass them by static reference.
  • Reduce the amount of clones, to_owned and to_string conversions in the Codebase.
  • Use Arc cloning to partially clone data between tokio:spawn threads in the aggregate function in the src/results/aggregator.rs file under the codebase.
  • Optimize the filter function in the src/results/aggregator.rs file under the Codebase to take time complexity of O(1) using while loop with variables handling the indices for the data structure used. And using the swap_remove function of vector to remove elements from the data structure used in the function.
  • Use branchless coding style to reduce code branching.

For more information on how reducing branches can improve performance. See:
https://codinginterviewsmadesimple.substack.com/p/understanding-branchless-programming

  • Use FuturesUnordered from futures crate to fetch the results from the upstream engines in an unordered fashion in the src/results/aggregator.rs file. Which will not require each request to wait in order to complete the previous actions, which improves the speed of fetching and aggregating results.

  • Use asynchronous crates for compression and use asynchronous tokio::io and tokio::fs asynchronous methods over synchronous std::io and std::fs methods in asynchronous code and make functions asynchronous to improve performance.

Reasoning Behind The Proposed Changes

The reasoning behind the following changes is to improve the performance of the engine which can reduce the time it takes to display the search results, which can drastically improve the user experience.

Do you want to work on this issue?

None

Additional information

No response

Copy link

To reduce notifications, issues are locked until they are 🏁 status: ready for dev and to be assigned. You can learn more in our contributing guide https://github.com/neon-mmd/websurfx/blob/rolling/CONTRIBUTING.md

Copy link

The issue has been unlocked and is now ready for dev. If you would like to work on this issue, you can comment to have it assigned to you. You can learn more in our contributing guide https://github.com/neon-mmd/websurfx/blob/rolling/CONTRIBUTING.md

neon-mmd added a commit that referenced this issue Mar 6, 2024
…ant (#486)

- initializes & stores the config & cache structs as a static constant.
- Pass the config & cache structs as a static reference to all the
  functions handling their respective route.
neon-mmd added a commit that referenced this issue Mar 6, 2024
…lts (#486)

- replace hashmaps with vectors for fetching, collecting & aggregating results as it tends to be contigous & cache efficient data structure.
- refactor & redesign algorithms for fetching & aggregating results
  centered around vectors in aggregate function.
neon-mmd added a commit that referenced this issue Mar 6, 2024
… tokio spawn tasks (#486)

- using the `futureunordered` instead of vector for collecting results
  reduces the time it takes to fetch the results as the results do not
  need to come in specific order so any result that gets fetched first
  gets collected in the `futureunordered` type.

Co-authored-by: Spencerjibz <[email protected]>
neon-mmd added a commit that referenced this issue Mar 6, 2024
neon-mmd added a commit that referenced this issue Mar 11, 2024
…gine (#540)

* ♻️ refactor: initialize & store the config & cache structs as a constant (#486)
- initializes & stores the config & cache structs as a static constant.
- Pass the config & cache structs as a static reference to all the
  functions handling their respective route.

* ⚡ perf: replace hashmaps with vectors for fetching & aggregating results (#486)
- replace hashmaps with vectors for fetching, collecting & aggregating results as it tends to be contigous & cache efficient data structure.
- refactor & redesign algorithms for fetching & aggregating results
  centered around vectors in aggregate function.

* ➕ build: add the future crate (#486)

* ⚡ perf: use `futureunordered` for collecting results fetched from the tokio spawn tasks (#486)
- using the `futureunordered` instead of vector for collecting results
  reduces the time it takes to fetch the results as the results do not
  need to come in specific order so any result that gets fetched first
  gets collected in the `futureunordered` type.

Co-authored-by: Spencerjibz <[email protected]>

* ⚡ perf: initialize new async connections parallely using tokio spawn tasks (#486)

* ⚡ perf: initialize redis pipeline struct once with the default size of 3 (#486)

* ⚡ perf: reduce branch predictions by reducing conditional code branches (#486)

* ✅ test(unit): provide unit test for the `get_safesearch_level` function (#486)

* ⚡ perf: reduce clones & use index based loop to improve search results filtering performance (#486)

* 🚨 fix(clippy): make clippy/format checks happy (#486)

* 🚨 fix(build): make the cargo build check happy (#486)

* ⚡ perf: reduce the amount of clones, to_owneds & to_strings (#486)

* ⚡ perf: use async crates & methods & make functions async (#486)

* 🔖 chore(release): bump the app version (#486)

---------

Co-authored-by: Spencerjibz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment