Skip to content

donovan68/scrapyd

 
 

Repository files navigation

Scrapyd

https://secure.travis-ci.org/scrapy/scrapyd.svg?branch=master

Scrapyd is a service for running `Scrapy`_ spiders.

It allows you to deploy your Scrapy projects and control their spiders using an HTTP JSON API. Installation This documents explains how to install and configure Scrapyd, to deploy and run your Scrapy spiders.

Requirements Scrapyd depends on the following libraries, but the installation process takes care of installing the missing ones:

Python 2.6 or above Twisted 8.0 or above Scrapy 0.17 or above Installing Scrapyd (generic way) How to install Scrapyd depends on the platform you’re using. The generic way is to install it from PyPI:

pip install scrapyd If you plan to deploy Scrapyd in Ubuntu, Scrapyd comes with official Ubuntu packages (see below) for installing it as a system service, which eases the administration work.

Other distributions and operating systems (Windows, Mac OS X) don’t yet have specific packages and require to use the generic installation mechanism in addition to configuring paths and enabling it run as a system service. You are very welcome to contribute Scrapyd packages for your platform of choice, just send a pull request on Github.

Installing Scrapyd in Ubuntu Scrapyd comes with official Ubuntu packages ready to use in your Ubuntu servers. They are shipped in the same APT repos of Scrapy, which can be added as described in Scrapy Ubuntu packages. Once you have added the Scrapy APT repos, you can install Scrapyd with apt-get:

apt-get install scrapyd This will install Scrapyd in your Ubuntu server creating a scrapy user which Scrapyd will run as. It will also create the directories and files described below:

/etc/scrapyd Scrapyd configuration files. See Configuration file.

/var/log/scrapyd/scrapyd.log Scrapyd main log file.

/var/log/scrapyd/scrapyd.out The standard output captured from Scrapyd process and any sub-process spawned from it.

/var/log/scrapyd/scrapyd.err The standard error captured from Scrapyd and any sub-process spawned from it. Remember to check this file if you’re having problems, as the errors may not get logged to the scrapyd.log file.

/var/log/scrapyd/project Besides the main service log file, Scrapyd stores one log file per crawling process in:

/var/log/scrapyd/PROJECT/SPIDER/ID.log Where ID is a unique id for the run.

/var/lib/scrapyd/ Directory used to store data files (uploaded eggs and spider queues).

About

A service daemon to run Scrapy spiders

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 96.0%
  • Shell 4.0%