Website Scraper

A system to execute regular HTTP based scans on websites to gain the site insights. System need to be provided with the subdomain and domain which is later scraped over regular intervals.

Application logic can be extended and modified further to work according to a specific requirement. A website can be scanned using the API endpoints given under the API docs

Development Setup

In order to run a local development setup, the running system need to fulfill following app dependencies :-

Local CockroachDB instance

To setup local database, install docker and running the setup-db script. It will run a local docker CockroachDB instance and import the scraper db schema to it.

$ ./dbschema/setup-db.sh

One can access the database with

$ psql 'postgresql:https://root@localhost:26257/scraper?sslmode=disable'

Python import dependencies

Python app dependencies can be easily setup inside a python virtual environment using pipenv inside the project repository to make all dependencies ready,

$ pipenv install

Also update the config used by application and edit file src/scraper/config/config.py to use development.json instead of deafult config.json.

Once config changes are done and database is ready, app can be up and running with

$ pipenv run ./src/scraper_app

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
config		config
dbschema		dbschema
docs		docs
src		src
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
pyvenv.cfg		pyvenv.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website Scraper

Development Setup

Local CockroachDB instance

Python import dependencies

About

Releases

Packages

Languages

shubhamgosain/Website-Scraper

Folders and files

Latest commit

History

Repository files navigation

Website Scraper

Development Setup

Local CockroachDB instance

Python import dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages