spidersNest

A webcrawler application with Scrapy, Zyte and MongoDB Atlas.

Websites crawled

Environment Setup

$ virtualenv <env_name>
$ source <env_name>/bin/activate
(<env_name>)$ pip install -r requirements.txt

Set up MongoDB

Docs: MongoDB Docs

After MongoDB is set, initialize the connection at a database.py file in the same folder of pipelines.py.

E.g. with MongoDB Atlas:

#database.py
import pymongo
dbClient = pymongo.MongoClient("mongodb+srv:https://<username>:<password>@<cluster>.i3n8n.mongodb.net/<db_name>?retryWrites=true&w=majority")

Run

$ scrapy crawl boletimEconomico
$ scrapy crawl infoMoney
$ scrapy crawl uolEconomia

Run & dump to JSON Line file

$ scrapy crawl boletimEconomico -o boletimEconomico.jl
$ scrapy crawl infoMoney -o infoMoney.jl
$ scrapy crawl uolEconomia -o uolEconomia.jl

Deploy

Docs: Zyte Docs (former Scrapy Cloud)

E.g.:

After creating your account, project in Zyte and setting up dependencies:

 $ pip install shub
 $ shub login
    API key: <api_key>
 $ shub deploy <project_id>

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
spidersNest		spidersNest
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapinghub.yml		scrapinghub.yml
scrapy.cfg		scrapy.cfg
setup.py		setup.py
work_log.txt		work_log.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spidersNest

Websites crawled

Environment Setup

Set up MongoDB

E.g. with MongoDB Atlas:

Run

Run & dump to JSON Line file

Deploy

E.g.:

About

Releases

Packages

Languages

License

jgfn1/spidersNest

Folders and files

Latest commit

History

Repository files navigation

spidersNest

Websites crawled

Environment Setup

Set up MongoDB

E.g. with MongoDB Atlas:

Run

Run & dump to JSON Line file

Deploy

E.g.:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages