Crawler for Wikipedia

A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).

This list will hold downloaded links

urls_bfs = []

Download the page and save it using the name '/wiki/...' this will help us construct URLS if required in future

def download_page(url)

Function to extract the links from content block of a page and return it in a list

def get_links_bfs(url)

Function for BFS crawling:

def bfs(url)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawler for Wikipedia

About

Releases

Packages

Languages

zhgulden/crawler

Folders and files

Latest commit

History

Repository files navigation

Crawler for Wikipedia

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages