NewsScraper - Scrape any newspaper automatically

This is a simple python script for automatically scraping the most recent articles from any news-site.

Just add the websites you want to scrape to NewsPapers.json and the script will go through and scrape each site listed in the file.

This repository was originally created as part of this tutorial.

Thanks to Pål Grønås Drange for his contributions to the repository.

Installing

You need to download the content of this repository, then run

pip install -r requirements.txt

Usage

Simply run python newsscraper.py NewsPapers.json.

The NewsPapers.json file should be a JSON file like this:

{
  "bbc": {
    "rss": "https://feeds.bbci.co.uk/news/rss.xml",
    "link": "https://www.bbc.com/"
  },
  "breitbart": {
    "link": "https://www.breitbart.com/"
  },
  "cnn": {
    "rss": "https://rss.cnn.com/rss/edition.rss",
    "link": "https://edition.cnn.com/"
  },
  "foxnews": {
    "rss": "https://feeds.foxnews.com/foxnews/latest",
    "link": "https://www.foxnews.com/"
  },
  "nytimes_frontpage": {
    "link": "https://nytimes.com/",
    "rss": "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"
  },
  "nytimes_international": {
    "link": "https://nytimes.com/",
    "rss": "https://rss.nytimes.com/services/xml/rss/nyt/World.xml"
  },
  "theguardian": {
    "rss": "https://www.theguardian.com/uk/rss",
    "link": "https://www.theguardian.com/international"
  },
  "washingtonpost": {
    "rss": "https://feeds.washingtonpost.com/rss/world",
    "link": "https://www.washingtonpost.com/"
  },
  "wsj": {
    "rss": "https://feeds.a.dj.com/rss/RSSWorldNews.xml",
    "link": "https://www.wsj.com"
  }
}

Libraries

This script uses the following libraries:

https://github.com/codelucas/newspaper

https://github.com/kurtmckee/feedparser

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
NewsPapers.json		NewsPapers.json
NorwayPapers.json		NorwayPapers.json
README.md		README.md
handler.py		handler.py
newsscraper.py		newsscraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewsScraper - Scrape any newspaper automatically

Installing

Usage

Libraries

About

Releases

Packages

Languages

fetttttjoe/NewsScraper

Folders and files

Latest commit

History

Repository files navigation

NewsScraper - Scrape any newspaper automatically

Installing

Usage

Libraries

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages