Skip to content

ML powered business intelligence dashboard that provides insights from Yelp reviews

License

Notifications You must be signed in to change notification settings

TallyAI/tally-ai-ds

Repository files navigation

Tally AI

A Word-Trend Business Intelligence Dashboard That Provides Actionable Business Insights.

Business owners don’t have time to decode what people are saying about their business online - they just want to know what to improve - so our goal for Tally AI was to provide actionable suggestions to help businesses grow profit.

The app is currently piloting its functionality on hundreds of cafes and restaurants around the Phoenix, AZ area.

Tally is a one-stop snapshot for understanding your businesses' Yelp reviews.

 "Data analytics is not just for big corporations. 
 Your small business can stay on top of an ever changing marketplace 
 with the power of Tally."

Check the App Out

Contributors

Data Science

Wenjing Liu Lily Su Rohan Kulkarni

Web Development

Patrick Stevenson Steve Renner Rohan Kulkarni David Downes

Product Manager | UX Designer

Elizabeth Ter Sahakyan Colton Mortensen

Project Overview

Web Application UI

Example conclusions from looking at the above dashboard by business owners

I might look into training my staff on customer service etiquette 
since people are complaining about the service.
I'm relieved that my half-price bottle service is getting 
buzz from the word trend chart.
Seeing a snapshot of trending phrases from my competitors 
have made me realize that I might think about introducing happy hour.

This is a Django app for data science micro service,
... ... ... locally running on Windows 10, deployed on AWS Elastic Beanstalk.

【Tally AI Front End】 for work with Front End UI Design

【Tally AI Back End】 for additional repos regarding authentification

【Tally AI Documentation】 for technical details on our project.

【AWS EB deployment logs】 for logs of our AWS Elastic Beanstalk Deployments

【All SQLs used in this project】 for useful SQL queries we used

【A D3.js line chart】 for exploratory data visualization work prior to migrating to Recharts

Product Canvas

Deployed Front End

Tech Stack & Architecture

React, Material UI, Recharts, Python, Django, Postgres, AWS

NLP Packages Used

Spacy, Textrank, Scattertext

Data Sources

Release Canvas Presentation Slides 1-3

Web | Data Science Release Canvas Deliverables

Python Notebooks

Exploratory Data Analysis Yelp Dataset

NLP - BERT, word vectors, sentence vectors

Calculating Word Frequency Correlations with Ratings

NLP - Spacy Named Entity Recognition POS Tagging Exploration

Finding Context in Words Correlated with Highest and Lowest Ratings

Refactored Context in Words Correlated with Highest and Lowest Ratings

WordNet and Vader Sentiment Explorations

LDA Topic Modeling Explorations

How to Connect to the Data Science API

Web Scraped Endpoints Returns 10 positive and 10 negative word phrases associated with a business https://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=0

{
viztype0: {
positive: [
      {
         term: "cool cats",
         score: 0.08981400595659608
      },
      {
         term: "rescued cats",
         score: 0.08956279306536073
      }
      ],
   negative: [
      {
         term: "just bad business",
         score: 0.0442848147595502
      },
      {
         term: "a refund",
         score: 0.03511932390225489
      }
   ]
},

Cumulative average of review star ratings for the past 8 weeks vs the average rating per week . timespan 8 weeks e.g. 8 weeks ago: 1,1,1,1,1, weekly_avg_rating=1, cumulative_avg_rating=1 7 weeks ago: 2,2,2,2,2, weekly_avg_rating=2, cumulative_avg_rating=1.5 6 weeks ago: 3,3,3,3,3, weekly_avg_rating=3, cumulative_avg_rating=2 https://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=0

[
   {
     date: '2020-01-10’, 
     cumulative_avg_rating: 3, 
     weekly_avg_rating: 2
   },
   {
     date: 'Date 2', 
     cumulative_avg_rating: 4,
     weekly_avg_rating: 3
   }
]

Endpoints Looking Through Yelp Dataset Returns “Trending” word phrases and their comparative fluctuations over segments of time. https://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=1

[
   {
       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 1}, 
               { phrase: "phrase 2", rank: 1}, 
               { phrase: "phrase 3", rank: 1} ]
   },
   {
       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 2}, 
               { phrase: "phrase 2", rank: 2}, 
               { phrase: "phrase 3", rank: 1.5} ]
   },
   {
       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 2}, 
               { phrase: "phrase 2", rank: 4}, 
               { phrase: "phrase 3", rank: 2} ]
   },
]

Review frequency - shows change in number of reviews over time https://django-tally-dev.n9ntucwqks.us-west-2.elasticbeanstalk.com/yelp/jga_2HO_j4I7tSYf5cCEnQ?viztype=2

[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2}, 
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2}, 
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4}, 
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1}, 
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1}, 
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}]

【Testing URLs】
【Testing data documents】
【Testing script Colab】

Activate Virtual Enviroment

Miniconda3 or Anaconda3 Python 3.7 【Logs】
(If you are using Python 3.6 or manage your enviroments in some other way, skip this step.)

$ conda create -n python3.6 python=3.6
$ pip install pipenv
$ conda activate python3.6

(base) PS D:\github\django-tally>

$ pipenv install
$ pipenv shell

Install dependencies:
(If you have downloaded the repo, you can skip this step.)

$ pipenv install django psycopg2-binary djangorestframework pyyaml lxml "spacy>=2.0.0,<3.0.0" https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz pytextrank "apscheduler>=3.6.3" django-apscheduler gensim sklearn

Generate requirements.txt

$ pip freeze > requirements.txt

Or $ pip freeze | Out-File -Encoding UTF8 requirements.txt
In the requirements.txt file, remove entries for spacy and en_core_web_sm, and add the following lines.

spacy>=2.0.0,<3.0.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz

Frequently Used Django Commands

$ python manage.py runserver
$ python manage.py makemigrations  
$ python manage.py migrate  
$ python manage.py test --keepdb
$ python manage.py inspectdb > models.py
$ python manage.py collectstatic
$ python -m django --version

Deploy to AWS Elastic Beanstalk

During the deployment, you may need to use the following AWS CLI commands.

$ eb init -p python-3.6 django-tally
$ eb create django-tally
$ eb status
$ eb deploy
$ eb open
$ eb logs
$ eb config
$ eb terminate django-tally
$ aws elasticbeanstalk restart-app-server --environment-name django-tally

【Logs】

(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> python --version
Python 3.7.4
(base) PS C:\Users\guido> aws --version
File association not found for extension .py
aws-cli/1.17.5 Python/3.7.4 Windows/10 botocore/1.13.50
(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> eb --version
EB CLI 3.17.0 (Python 3.7.4)
(django-tally-QTYVOJb0) (python3.6) D:\github\django-tally>python manage.py collectstatic
163 static files copied to 'D:\github\django-tally\static'.

【AWS Elastic Beanstalk Configuration】
All Applications -> django-tally -> Configuration -> Software -> Change:
Set WSGIPath = tally/wsgi.py
Set system environment variables here too

Testing URLs

https://127.0.0.1:8000/admin
https://127.0.0.1:8000/admin/django_apscheduler/
Below links are for【tesing】.
https://127.0.0.1:8000/yelp/index
https://www.yelp.com/biz/aunt-jakes-new-york
https://127.0.0.1:8000/yelp/aunt-jakes-new-york (by business alias)
https://127.0.0.1:8000/yelp/I2lgw_7DUnwD92ND4PN-Ow?viztype=0 (by business ID)
https://127.0.0.1:8000/yelp/DR22QPe3A52diajwPuooVA?viztype=0
https://www.yelp.com/biz/Iq7NqQD-sESu3vr9iEGuTA (Butters Pancakes & Café)
https://127.0.0.1:8000/yelp/Iq7NqQD-sESu3vr9iEGuTA?viztype=1
https://www.yelp.com/biz/y0GZCNHDbFYr6Rjk3OzgYg (Jarrod's Coffee, Tea & Gallery)
https://127.0.0.1:8000/yelp/y0GZCNHDbFYr6Rjk3OzgYg?viztype=1
You should get trendy phrases such as "beautiful art", "art gallery", "downtown mesa", etc.
https://127.0.0.1:8000/jobs/logs/jga_2HO_j4I7tSYf5cCEnQ?num=20 (view job logs by business ID) The links below are 【examples】.
https://127.0.0.1:8000/yelp/y0GZCNHDbFYr6Rjk3OzgYg?viztype=1
You should get monthly rating counts like below.

[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2}, 
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2}, 
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4}, 
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1}, 
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1}, 
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}]

https://127.0.0.1:8000/bucketlists (create)
https://127.0.0.1:8000/bucketlists/1 (get, put, delete)
https://127.0.0.1:8000/jobs/example (APScheduler background job)

【Testing URLs】
【Testing data documents】
【Testing script Colab】

Create A Project

【Example】

$ cd C:\Users\guido\.virtualenvs\django-tally-QTYVOJb0\Scripts\
$ python django-admin.py startproject tally D:\github\django-tally

project name: tally
project created in directory: D:\github\django-tally

Run Django app

$ cd path/to/django-tally
$ python manage.py runserver

【Logs】

Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).

You have 17 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python manage.py migrate' to apply them.
January 07, 2020 - 01:05:29
Django version 3.0.2, using settings 'tally.settings'
Starting development server at https://127.0.0.1:8000/
Quit the server with CTRL-BREAK.
[07/Jan/2020 01:05:55] "GET / HTTP/1.1" 200 16351
[07/Jan/2020 01:05:55] "GET /static/admin/css/fonts.css HTTP/1.1" 200 423
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Light-webfont.woff HTTP/1.1" 200 85692
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Bold-webfont.woff HTTP/1.1" 200 86184
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Regular-webfont.woff HTTP/1.1" 200 85876

Configurate settings.py

(If you have download the repo, you can skip this step.)

# Internationalization
# https://docs.djangoproject.com/en/3.0/topics/i18n/
LANGUAGE_CODE = 'en-us'
TIME_ZONE = 'US/Central' # 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True

Database configuration

In the tally/settings.py file, edit the database connection configuration.
(If you have download the repo, you can skip this step.)

# Database 
# https://docs.djangoproject.com/en/3.0/ref/settings/#databases
import os
if 'RDS_HOSTNAME' in os.environ:
    DATABASES = {
        'default': {
            'ENGINE': 'django.db.backends.postgresql_psycopg2',
            'NAME': os.environ['RDS_DB_NAME'],
            'USER': os.environ['RDS_USERNAME'],
            'PASSWORD': os.environ['RDS_PASSWORD'],
            'HOST': os.environ['RDS_HOSTNAME'],
            'PORT': os.environ['RDS_PORT'],
            'OPTIONS': {
            	'options': '-c search_path=django'
            },        
            'TEST': {
                'ENGINE': 'django.db.backends.sqlite3',
            },
        }
    }

【Local Environment】
Add system environment variables in the Python virtual environment (NO quotation marks).
You can add a .env file in the django-tally folder, then add the following lines to the file (replace * with your credentials). Every time when you start the virtual environment, those variables will be set automatically. (Please make sure that in the .gitignore file .env has been added, or you are exposing the credentials to the Internet.)

RDS_DB_NAME=*
RDS_USERNAME=*
RDS_PASSWORD=*
RDS_HOSTNAME=*
RDS_PORT=*

【Manually】
Or you can manually add it every time after you start the virtual environment.
For Windows Powershell, use set VARNAME=value.
For MacOS/Linux use export VARNAME=value.

(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_DB_NAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_USERNAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PASSWORD=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_HOSTNAME=*.*.us-east-2.rds.amazonaws.com
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PORT=*

【Verification】
To make sure the variables are properly created, type python then print out os.environ[<varname>].

(django-tally-QTYVOJb0) (base) D:\github\django-tally>python
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Warning:
This Python interpreter is in a conda environment, but the environment has
not been activated.  Libraries may fail to load.  To activate this environment
please see https://conda.io/activation
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['RDS_DB_NAME']

【Deployment】
To configure the instance deployed on AWS Elastic Beanstalk.
Go to the application Configuration page, choose Software.

Add system environment variables there.

Migration

If you have downloaded this repo, you can skip this step.

$ cd path/to/django-tally
$ python manage.py migrate

【Logs】

Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying sessions.0001_initial... OK

Django migration will create tables automatically in the database.

Create Django Admin User

$ cd path/to/django-tally
$ python manage.py createsuperuser

【Logs】

Username (leave blank to use 'guido'): ***
Email address: [email protected]
Password:
Password (again):
This password is too short. It must contain at least 8 characters.
This password is too common.
This password is entirely numeric.
Bypass password validation and create user anyway? [y/N]: n
Password:
Password (again):
Superuser created successfully.

Use Django REST Framework for APIs

(If you have downloaded the repo, you can skip this step.)

PS D:\github\django-tally>

# D:\github\django-tally\tally\settings.py
...
# Application definition
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',             # Add this line; other app names are not allowed
    'example',                   # Add this line; you can use app names other than "example" 
    'yelp',                       # Add this app as well for this project
]

Create an app called "example".

$ python manage.py startapp example

Setting up URL patterns
E.g. regular expression match UUID as primary key (?P<pk>[0-9a-f-]+):

urlpatterns = {
    url(r'^yelp/$', 
        YelpYelpScrapingCreateView.as_view(), name="create"),
    url(r'^yelp/(?P<pk>[0-9a-f-]+)/$',
        YelpYelpScrapingDetailsView.as_view(), name="details"),
}

E.g. query strings

urlpatterns = {path('<slug:business_id>', home, name='home')}
def home(request, business_id):
    viztype = request.GET.get('viztype')
    if viztype == '1':
        result = json.dumps(yelpTrendyPhrases(business_id))
    elif viztype == '2':
        result = json.dumps(yelpReviewCountMonthly(business_id))
    else:
        result = json.dumps(getDataViztype0(business_id))
    return HttpResponse(result)

Follow this tutorial to build a REST API.

Django Auto-Generate Data Models from Database Tables

$ python manage.py inspectdb > models.py

After running this command, modify class names in the models.py file.
Add to every class name. E.g.
For app "example", change class Bucketlist -> class ExampleBucketlist
For app "yelp", change class Business -> class YelpBusiness
Follow the instructions in the models.py file, make sure model definitions are correct.
Then move the models.py file to the corresponding app folder.
So every app would have their own models without conflicting with other apps.
This is an example of the Django data models created.
https://github.com/Nov05/django-tally/blob/master/example/models.py
You can query with or without Django data models. E.g.
https://github.com/Nov05/django-tally/blob/master/tallylib/sql.py
【Debug】
Issue: Django “ValueError: source code string cannot contain null bytes”
Solution: You can simply create a new .py file, copy and paste the models.py content to it, then replace the models.py file with it.

spaCy

spaCy models
https://spacy.io/usage/models
How to install models
https://pypi.org/project/spacy/
Download spaCy model manually (Not in use)
https://github.com/explosion/spacy-models/releases

You can install spaCy models just like installing a Python package.
pipenv install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz
Then import the models in your code.

import en_core_web_sm
nlp = en_core_web_sm.load()   

or

import spacy
nlp = spacy.load("en_core_web_sm") 

【Deployment】 Make sure the following 2 lines are in the requirements.txt.

spacy>=2.0.0,<3.0.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.5/en_core_web_sm-2.2.5.tar.gz

Make sure remove spacy==2.2.3and en_core_web_sm==2.25 from the file, or you will get an error when delpoying saying "Could not find a version that satisfies the requirement en-core-web-sm==2.2.5".
【Manually】 Put the following folder in the repo (same level with manage.py).
spacy.load("en_core_web_sm/en_core_web_sm-2.2.5") with __init__.py
CAUTION: You can do it this way, but deployment from Windows 10 to AWS Elastica Beanstalk might have UnicodeDecodeError when loading a model, while both launching server on Windows 10 locally or deployment from MacOS seem fine.

Background Job Scheduling

**Advanced Python Scheduler** * [APScheduler official document](https://apscheduler.readthedocs.io/en/stable/index.html) * [Django-apscheduler Github repo](https://github.com/jarekwg/django-apscheduler) * [An important tutorial](https://medium.com/@mrgrantanderson/replacing-cron-and-running-background-tasks-in-django-using-apscheduler-and-django-apscheduler-d562646c062e) * [A simple example](https://github.com/agronholm/apscheduler/blob/master/examples/schedulers/background.py) of setting up a background job by using `apscheduler.schedulers.background.BackgroundScheduler`. * [【My example code】](https://github.com/Nov05/django-tally/blob/master/jobs/examples.py), [【Logs】](https://github.com/Nov05/yelp-dataset-challenge/blob/master/apscheduler/2020-01-17%20backgroud%20job%20example.md) ``` $ pipenv install apscheduler $ pipenv install django-apscheduler ```

Reference

Contributing

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Please note we have a code of conduct. Please follow it in all your interactions with the project.

Issue/Bug Request

If you are having an issue with the existing project code, please submit a bug report under the following guidelines:

  • Check first to see if your issue has already been reported.
  • Check to see if the issue has recently been fixed by attempting to reproduce the issue using the latest master branch in the repository.
  • Create a live example of the problem.
  • Submit a detailed bug report including your environment & browser, steps to reproduce the issue, actual and expected outcomes, where you believe the issue is originating from, and any potential solutions you have considered.

Feature Requests

We would love to hear from you about new features which would improve this app and further the aims of our project. Please provide as much detail and information as possible to show us why you think your new feature should be implemented.

Pull Requests

If you have developed a patch, bug fix, or new feature that would improve this app, please submit a pull request. It is best to communicate your ideas with the developers first before investing a great deal of time into a pull request to ensure that it will mesh smoothly with the project.

Remember that this project is licensed under the MIT license, and by submitting a pull request, you agree that your work will be, too.

Pull Request Guidelines

  • Ensure any install or build dependencies are removed before the end of the layer when doing a build.
  • Update the README.md with details of changes to the interface, including new plist variables, exposed ports, useful file locations and container parameters.
  • Ensure that your code conforms to our existing code conventions and test coverage.
  • Include the relevant issue number, if applicable.
  • You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.

Attribution

These contribution guidelines have been adapted from this good-Contributing.md-template.

Documentation

See Project Documentation for technical details on our project.

MIT

About

ML powered business intelligence dashboard that provides insights from Yelp reviews

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published