This is a data analysis project consisting of:
- Web scrapping using Scrapy and Postgres.
- And exploratory data analysis.
Post items and usernames were scrapped from the publicly available Hacker News API and stored on a postgres database running on the cloud.More information about the scrapper can be found in the
README
in the hackernews_scrapper folder.
In total: 3 months of posts comprising:
- 1.2 million posts
- and 77k users were scrapped from the api.
- Data cleaning was performed to tidy up data.
- Exploratory Analysis and Named Entity Recognition were done on the data to answer questions such has:
- Best day and hour to post.
- What topics have the highest form of engagement during the period.