A webcrawler application with Scrapy, Zyte and MongoDB Atlas.
$ virtualenv <env_name>
$ source <env_name>/bin/activate
(<env_name>)$ pip install -r requirements.txt
Docs: MongoDB Docs
After MongoDB is set, initialize the connection at a database.py file in the same folder of pipelines.py.
#database.py
import pymongo
dbClient = pymongo.MongoClient("mongodb+srv:https://<username>:<password>@<cluster>.i3n8n.mongodb.net/<db_name>?retryWrites=true&w=majority")
$ scrapy crawl boletimEconomico
$ scrapy crawl infoMoney
$ scrapy crawl uolEconomia
$ scrapy crawl boletimEconomico -o boletimEconomico.jl
$ scrapy crawl infoMoney -o infoMoney.jl
$ scrapy crawl uolEconomia -o uolEconomia.jl
Docs: Zyte Docs (former Scrapy Cloud)
After creating your account, project in Zyte and setting up dependencies:
$ pip install shub
$ shub login
API key: <api_key>
$ shub deploy <project_id>