Skip to content

HWNi/data512-final-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

DATA512 - A2: Bias in data

Abstract

In this project, we saw an exploratory data analysis to the Yelp data from the Yelp Dataset Challenge by using correlation analysis and visualizations. The research question is about finding correlations between restaurant overall rating and varioius restaurant attributes. My hypotheses was verified that some aspects of restaurant are more likely to determine the overall rating of restaurant than other attributes, but most attributes barely have correlation (i.e. correlation coefficient is between 0.3 and -0.3) with restaurant overall rating. This might be due to the large amount of null values in the dataset and unbalanced data. In genearl, the Yelp rating appears to be normal distributed, but we do need to be cautious when interpreting the restaurant rating because Yelp ratings may suffer from bias due to stereotype or customers' psychological suggestions.

Getting Started

You will need Python 3.X and Jupyter notebook installed to reproduce this project by running data512-final-project.ipynb. To install Python 3, see download and beginner's guide To install Jupyter Notebook, follow installation

Addtionally, you will need following packages:

Data Source Information and Usage

This project will use datasets from Yelp Dataset Challenge. The datasets are available in both JSON and SQL files. (See the documentation from here). I will mainly use business dataset in JSON format for my project:

  • Businesss.json

The business dataset contains business data including location data, attributes, and categories.

For categories,

// an array of strings of business categories

"categories": [

    "Mexican",

    "Burgers",

    "Gastropubs"

],

which has a variety of tags. It’s impossible to get all restaurants in the data by simply searching “restaurant” for category. Thus, I will need to do some explorations to the category in business data and figure out the set of applicable categories to my research.

License

Yelp provides a guideline regarding the dataset license.

This project is licensed under the MIT License - see the LICENSE.md file for details

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages