Skip to content

This repository will help you to extract all the urls reported on a website and save it in an excel file. This code works on multiple weblinks provided in a csv file and thus saves you a lot of manual work. If you are working on scouting multiple websites for identifying press releases, presentations, annual reports etc., this code will come in …

Notifications You must be signed in to change notification settings

Erdos1729/scrape-urls-from-multiple-websites-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrape urls from multiple websites 2.0

This repository will help you to extract all the urls reported on a website and save it in an excel file. This code works on multiple weblinks provided in a csv file and thus saves you a lot of manual work. If you are working on scouting multiple websites for identifying press releases, presentations, annual reports etc., this code will come in handy and save alot of man-hours.

Instructions

  • pip install -r requirements
  • Run url_extract_2.0.py

Reference

I devised the solution from the following pages of the documentation:

  • [Urllib] package that collects several modules for working with URLs
  • [beautyfulsoup4] to scrape information from web pages
  • [feedparser] to parse RSS feeds in Python
  • [pandas] for data structuring

About

This repository will help you to extract all the urls reported on a website and save it in an excel file. This code works on multiple weblinks provided in a csv file and thus saves you a lot of manual work. If you are working on scouting multiple websites for identifying press releases, presentations, annual reports etc., this code will come in …

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages