Web Crawler CLI

About

The tool is designed as a CLI tool which crawls the given URL and calculates the url ratio in the page. Once a URL crawling process is done(depth reached) a file with the results is created and the program ends.

System Design

The tool is made up of 2 components:

WebCrawler

The module is in charge of handling the in page links extraction from the URL and calculating the ratio. The main logic is written using asyncio Queue to handle all URLs to crawl through. Main functionality is written in async in order speed up the web page fetches and avoid blocking.

FileResultGenerator

In charge of writing the results of WebCrawler into a TSV formatted file.

Technical Details

Python version

3.11

Virtual Environment Set Up

python3.11 -m venv <path_to_env>
source <path_to_env>/bin/activate # incase of linux OS
<path_to_env>\Scripts\Activate.ps1 # incase of windows OS

python3.11 -m pip install -r requirements.txt

How To Use

Once virtual environment is set up you can use the tool in the following manner: python ./app.py

How to Test

Run the following command:

python3.11 -m pip install -r dev_requirements.txt

And after that:

pytest crawler\tests --cov-report term-missing --cov=crawler

Author

Nal Zazi

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
crawler		crawler
.gitignore		.gitignore
README.md		README.md
app.py		app.py
dev_requirements.txt		dev_requirements.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler CLI

About

System Design

WebCrawler

FileResultGenerator

Technical Details

Python version

Virtual Environment Set Up

How To Use

How to Test

Author

About

Releases

Packages

Languages

adigaboy/web_crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler CLI

About

System Design

WebCrawler

FileResultGenerator

Technical Details

Python version

Virtual Environment Set Up

How To Use

How to Test

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages