Skip to content

This project extracts top books from Goodreads for six genres and stores the data in a SQLite database. It's useful for tracking popular books and analyzing reading trends, and can be used for building recommendation systems and conducting data analysis.

Notifications You must be signed in to change notification settings

ansuff/goodreads-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

goodreads Scraper

This project extracts the top 30 books on the shelves on goodreads and the most read books this week for six different genres: Science Fiction, Travel, Thriller, Poetry, Fantasy, and Business. The extracted data is then stored in a SQLite database. This project is useful for anyone who wants to keep track of the most popular books in these genres and analyze trends in reading habits. It can also be used as a starting point for building a recommendation system or for conducting data analysis on book trends.

TBD: Finalizing scrapy scripts

Installation

To install the project and its dependencies, follow these steps:

  1. Clone the repository to your local machine.
  2. Navigate to the project directory.
  3. Run the run.sh script to install Poetry and the project dependencies.
  4. Run the run.sh script to scrape the top 30 books from goodreads and store them in a SQLite database.
./run.sh scrape
  1. (Optional) Run the run.sh script to run the project tests.
./run.sh unittest
  1. (Optional) Run the run.sh script to do linting.
./run.sh lint

About

This project extracts top books from Goodreads for six genres and stores the data in a SQLite database. It's useful for tracking popular books and analyzing reading trends, and can be used for building recommendation systems and conducting data analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published