Web scraping simply refers to the extracting and processing of information/data from websites. This abitity to scrape data from the internet is very useful. Through scraping we can transform unstructured data (HTML format) on the web into structured data (database or spreadsheet) for easier manipulation.
We can perform scraping through various ways, but I prefer Python since it is easy and has some powerful libraries required for this task.
- Urllib3 (fetches the web page):
It is a Python module used for fetching URLs. It contains useful functions and classes which help us with URL actions (basic and digest authentication, redirections, cookies, etc).
- BeautifulSoup (Scrapes the data) :
It is a Python library for pulling data out of HTML and XML files.It works with a parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. We can use it to extract tables, lists, paragraph and you can also put filters to extract information from web pages.
Run the python files present in this repository to get the following results :
- Image_Scraping.py
- IMDb_Review_Scraping.py