Skip to content

A deploy-ready webcrawler application with Scrapy, Zyte and MongoDB Atlas.

License

Notifications You must be signed in to change notification settings

jgfn1/spidersNest

Repository files navigation

spidersNest

A webcrawler application with Scrapy, Zyte and MongoDB Atlas.

Websites crawled

Environment Setup

$ virtualenv <env_name>
$ source <env_name>/bin/activate
(<env_name>)$ pip install -r requirements.txt

Set up MongoDB

Docs: MongoDB Docs

After MongoDB is set, initialize the connection at a database.py file in the same folder of pipelines.py.

E.g. with MongoDB Atlas:

#database.py
import pymongo
dbClient = pymongo.MongoClient("mongodb+srv:https://<username>:<password>@<cluster>.i3n8n.mongodb.net/<db_name>?retryWrites=true&w=majority")

Run

$ scrapy crawl boletimEconomico
$ scrapy crawl infoMoney
$ scrapy crawl uolEconomia

Run & dump to JSON Line file

$ scrapy crawl boletimEconomico -o boletimEconomico.jl
$ scrapy crawl infoMoney -o infoMoney.jl
$ scrapy crawl uolEconomia -o uolEconomia.jl 

Deploy

Docs: Zyte Docs (former Scrapy Cloud)

E.g.:

After creating your account, project in Zyte and setting up dependencies:

 $ pip install shub
 $ shub login
    API key: <api_key>
 $ shub deploy <project_id>

About

A deploy-ready webcrawler application with Scrapy, Zyte and MongoDB Atlas.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages