Skip to content

The New York Times articles scraper with GraphQL interface

Notifications You must be signed in to change notification settings

aneksamun/nytimes-articles-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The New York Times articles server

Build status

The service scrapes all news headlines from nytimes.com and expose them using the GraphQL API.
The articles can be queried using following GraphQL schema:

type News {
  title: String,
  link: String,
}

type Query {
  news: [News!]!
}

Once service is started it will start scraping headlines and will redirect user to GraphQL Playground page.

playground

Configuration

The service can customised by changing following settings

Setting Description Default value
SERVER_HTTP_PORT Server port 8080
DB_NAME Database name wardrobe
DB_HOST Database server host localhost
DB_PORT Database server port 5432
DB_USER Database user user
DB_PASSWORD Database user password 1234
DB_WHETHER_CREATE_SCHEMA Whether to create a database schema on the system run? true
NY_TIMES_URL The URL of the New York Times website https://www.nytimes.com/
SCRAPE_REPEAT_INTERVAL Scrape repeating interval every 4 hours

How to build?

  • Clone project
  • Build the project
  • Run tests
sbt compile
sbt test

Technology stack

  • scala 2.13.6 as the main application programming language
  • http4s typeful, functional, streaming HTTP for Scala
  • sangria a GraphQL implementation for Scala
  • scala-scraper a Scala library for scraping content from HTML pages
  • quill compile-time language integrated queries for Scala
  • cats to write more functional and less boilerplate code
  • cats-effect The Haskell IO monad for Scala
  • pureconfig for loading configuration files
  • refined for type constraints avoiding unnecessary testing and boilerplate
  • circe a JSON library for Scala
  • scalatest and ScalaCheck for unit and property based testing
  • testcontainers to run system dependant services for Integration Testing purposes

About

The New York Times articles scraper with GraphQL interface

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages