Skip to content
forked from binux/pyspider

A Powerful Spider(Web Crawler) System in Python.

License

Notifications You must be signed in to change notification settings

lilong07/pyspider

 
 

Repository files navigation

pyspider Build Status Coverage Status Try

A Powerful Spider(Web Crawler) System in Python. TRY IT NOW!

Tutorial: http:https://docs.pyspider.org/en/latest/tutorial/
Documentation: http:https://docs.pyspider.org/
Release notes: https://github.com/binux/pyspider/releases

Sample Code

from pyspider.libs.base_handler import *


class Handler(BaseHandler):
    crawl_config = {
    }

    @every(minutes=24 * 60)
    def on_start(self):
        self.crawl('http:https://scrapy.org/', callback=self.index_page)

    @config(age=10 * 24 * 60 * 60)
    def index_page(self, response):
        for each in response.doc('a[href^="http"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
            "url": response.url,
            "title": response.doc('title').text(),
        }

Demo

Installation

Quickstart: http:https://docs.pyspider.org/en/latest/Quickstart/

Contribute

TODO

v0.4.0

  • a visual scraping interface like portia

License

Licensed under the Apache License, Version 2.0

About

A Powerful Spider(Web Crawler) System in Python.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 85.8%
  • JavaScript 7.6%
  • HTML 3.8%
  • CSS 1.8%
  • Lua 1.0%