Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explain how to use scrapy-splash with AutoThrottle #78

Open
kmike opened this issue Jul 13, 2016 · 3 comments
Open

Explain how to use scrapy-splash with AutoThrottle #78

kmike opened this issue Jul 13, 2016 · 3 comments

Comments

@kmike
Copy link
Member

kmike commented Jul 13, 2016

AutoThrottle extension doesn't play nicely with scrapy-splash because it thinks requests take a very long time, and adjusts request rate accordingly.

@Gallaecio
Copy link
Contributor

Should we simply state that it should be disabled as part of the configuration instructions?

@kmike
Copy link
Member Author

kmike commented Nov 26, 2019

There are ways to make it work with AutoThrottle in a more reasonable way, e.g. https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py.

As a first step - yes, it makes sense to at least document this problem. For example, as I recall, Autothrottle is enabled by default on Scrapy Cloud (is it still on by default?).

@Gallaecio
Copy link
Contributor

What if we add something like https://github.com/TeamHG-Memex/undercrawler/blob/master/undercrawler/middleware/throttle.py to scrapy-splash itself?

In addition to documenting its (optional) usage, we could log a warning if Scrapy’s built-in AutoThrottle is used along with scrapy-splash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants