-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate scrapy to headless-chrome? #118
Comments
will take a lot of work, i guess |
Webkit is upgraded to a much more recent version in Splash master (~mid-2016 Safari), and will be upgraded further (to Webkit trunk) in future, thanks to https://github.com/annulen/webkit. You can use scrapinghub/splash:master Docker image to try the changes, or wait for Splash 3.0 release. Switching to Headless Chromium would be a huge change indeed. We don't have engineering resources to make this switch in a near future. Also, it may be easier to create a separate Scrapy + Headless Chromium intergation project. Switching to Headless Chromium has both advantages and disadvantages; it seems there are more advantages. But some Splash features can't be implemented in Headless Chromium AFAIK - e.g. per-request proxy options are impossible if I'm not mistaken - this feature is nice to have e.g. for Crawlera integration, to avoid using Crawlera for static resources. |
Thanks! Will try the master container to see if I can get around my scraping issues. |
got the following error when trying out the master dockerfile:
|
Could you try it again? It looks like a temporary issue - either a dockerhub issue, or a network issue. |
We 've now successfully tested splash 3.0 and are really impressed: The execution time of our scraping jobs (running layoutstats,js on ~ 120 URLs) dropped from approx 75 minutes to just 25 minutes :-) Taking screenshots also seems to work more reliable now. Big kudos to you and the guys behind the "Chromium 2016" port! |
A few weeks ago, the chromium project announced headless chromium as new, clean way to open websites in a non-UI server context.
The announcement had quite an impact in the headless-browser scene and resulted in the resignment of the phantomJS maintainer.
Since the current webkit engine of Splash dates back to 2013, I wanted to know whether there are any plans to port splash to headless chrome?
The text was updated successfully, but these errors were encountered: