-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The process of generating random user agents is abnormally time-consuming. #25
Comments
This was unexpected 🙂 After testing it on my side it turns out that for me it takes 7 seconds so it seems that it has to do with the The file that needs to be changed is located under |
So after studying and digging deep into the crate itself, I found that the crate actually fetches data from the upstream website and scrapes it to get the required user agents that's why it causes a delay and also it looks like the project has been abandoned because the last commit seems to be 5 years which is a very long time for an open source repository. Also, enabling the cache option did improve speed slightly by 2-3 seconds but I think that having a delay of 5 seconds seems to be good to allow some random delay to occur between requests which help to evade IP blocking I can think of reducing the random time delay that I have added in the code from 1-10 secs to 1-5 seconds to improve speed. What do you say @xffxff?? Also, maybe in the future, we might need to either explore an alternative for this crate or maybe implement our own 😄 . |
@neon-mmd Hmm, Do we have to insert a delay between different requests? This may conflict with our lighting-fast goal. Additionally, when there are many concurrent search requests, even with a delay, there will still be a lot of requests to the engine at the same time. |
websurfx/src/search_results_handler/user_agent.rs Lines 10 to 26 in 84dc6a9
I believe that it is unnecessary to construct a new // Construct the UserAgent object once when the server starts
let user_agents = UserAgentsBuilder::new()
.cache(false)
.dir(/tmp)
.thread(1)
.set_browsers(
Browsers::new()
.set_chrome()
.set_safari()
.set_edge()
.set_firefox()
.set_mozilla(),
)
.build()
...
// Retrieve a random user agent string in aggregator.rs
let user_agent = user_agents.random().to_string() |
I think @xffxff is right. Here is my implementation using the use fake_useragent::{Browsers, UserAgents, UserAgentsBuilder};
use lazy_static::lazy_static;
lazy_static! {
static ref USER_AGENTS: UserAgents = {
UserAgentsBuilder::new()
.cache(false)
.dir("/tmp")
.thread(1)
.set_browsers(
Browsers::new()
.set_chrome()
.set_safari()
.set_edge()
.set_firefox()
.set_mozilla(),
)
.build()
};
}
/// A function to generate a random user agent to improve privacy of the user.
///
/// # Returns
///
/// A randomly generated user agent string.
pub fn random_user_agent() -> String {
USER_AGENTS.random().to_string()
} |
No, actually we need it because if we do not add a random delay between requests especially for large-scale server use cases as these servers will have thousands of users and will create a lot of traffic and this, in turn, may cause the upstream search engines to get DDoSed which is not good and they might ban the IP that caused the DDoS but I can see one option like having a config option like |
This looks good 👍 but after doing some research to see whether there are any better and faster implementations than this I found that Here are some links to follow: |
@neon-mmd Thank you for the explanation! I think you are right and having a config option like |
I added logs before and after
random_user_agent()
and found that its processing time even exceeded 10 seconds. Is this expected?The text was updated successfully, but these errors were encountered: