Skip to content

SYDsCorner/Amazon_Vine_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Amazon_Vine_Analysis

Challenge Overview

Purpose:

The purpose of this analysis is to use PySpark to perform the ETL process to extract one of the datasets from Amazon reviews written by members of the paid Amazon Vine program. To accomplish this I transformed the data, connected to an AWS RDS instance, loaded the transformed data into pgAdmin, and then used PySpark to determine if there was any bias towards favorable reviews from Vine members in the dataset.

Resources

Results

  • The total number of reviews for all Vine and non-Vine reviews

    total_number_reviews

    • The total number of reviews for all Vine and non-Vine reviews is 18,155 people.
      • Appoximately 1% are Vine members. (136 people)
      • Appoximately 99% are non-Vine members. (18,019 people)
  • The number of 5-star reviews for all Vine and non-Vine reviews

    number_5star_reviews

    • There are 74 out of 136 Vine members gave 5-star reviews.
    • There are 8,482 out of 18,019 non-Vine members gave 5-star reviews.
  • The percentage 5-star reviews for all Vine and non-Vine reviews

    percent_5star_reviews

    • Appoximately 54% of Vine members gave 5-star reviews.
    • Appoximately 47% of non-Vine members gave 5-star reviews.

Summary:

For the results, we could come to the conclusion that there is a positivity bias for reviews in the Vine program on the furniture category. However, there are more than 50 datasets and difference categories that we could use to prove even further if there is any bias towards favorable reviews from Vine members.

  • The additional analyses that we could do with this dataset to support our statement are
    • use more datasets that are different categories from the Amazon Review datasets.
    • analyze more summary statistics such as mean, mode, and median of the star rating.