Skip to content

AvonleaFisher/Analyzing-NYC-311-Service-Requests

Repository files navigation

An Analysis of NYC 311 Service Requests

Predicting the Responding Government Agency

Author: Avonlea Fisher

Blog Post: https://towardsdatascience.com/analyzing-and-modelling-nyc-311-service-requests-eb6a9c9adc7c

Dashboard: https://nyc-311.herokuapp.com/

Interactive visualizations that cannot render in this repository are available in the Heroku dashboard screenshotted and linked above. Note: the dashboard may take some time to load.

The contents of this repository detail an analysis of NYC 311 Service requests and the community districts in which they were recorded.

Abstract:

Data pertaining to the time, location, and content of thousands of 311 calls in New York City is recorded every day. By studying trends in this data, government agencies can respond more effectively to non-emergency requests and issues raised by the populations they serve. Using public data on 311 calls and community districts in NYC, this project explores which types of calls are the most common, how daily call volume varies across different districts, and how calls are distributed to various responding government agencies. Using natural language processing and the Keras library, this project aims to develop a neural network that can classify the government agency that responded to a call, given the call's description as an input. The best-performing model correctly classified 73% of calls in a random subset of the data. The agency variable was heavily imbalanced: the New York Police Department (NYPD) responded to just over 50% of all 311 calls. There were 14 total government agencies to which 311 calls in the dataset were assigned, which presents difficulties in training a classifier with perfect accuracy. Currently, most non-emergency service requests are handled through a phone call. This type of classifier, if developed further, could facilitate the automatic assigment of non-emergency requests to the appropriate agency in an online context where requests generate text descriptions.

Data

The data used in this project was obtained from two sources:

This dataset contains information about the location, time, complaint type, and status of more than 24 million 311 service requests made in New York City within the past decade. This project uses a subset of the data from 2020 that was accessed with the Socrata Open Data (SODA) API.

After navigating to any profile on the Community District Profiles website, the Indicators Data can be obtained under "Download the Data." This dataset contains development and population information for each Community District in New York City. Community board names, which correspond to community districts, can also be found in the 311 dataset.

Methods and Repository Structure

The notebooks for this project were organized based on the OSEMN process, and should be followed in this order:

  • Obtaining_and_Scrubbing_the_Data.ipynb—Accessed data through the SODA API and Community District profiles. Dealt with missing and unnecessary data, reformatted values for consistency, and merged the two datasets.
  • Exploring_the_Data—Generated summary statistics and interactive visualizations of the data. Noise complaints were the most common complaint type, and the New York Police Department (NYPD) was the agency that responded to the vast majority of calls. Due to the size of the data represented in the visuals, this section is broken up into five separate notebooks:
    • Bar Charts
    • Line and Area Charts
    • Mapbox Density Heatmaps
    • Correlation Heatmap and Wordcloud
    • Scatterplots
  • Modeling_and_Interpreting.ipynb—Using the Keras library, preprocessed data for modelling and evaluated the performance of different classification models with various parameter grids. Performance was evaluated on 1) test data from the resampled subset and 2) another random subset of the data that was not resampled.

The Dashboard folder contains the files that were used to create the NYC 311 Heroku app:

  • Various CSV files with data used in app
  • main.py—Python code for app
  • requirements.txt—Dependencies
  • Procfile—Instructions to Heroku for how to start the app

A .gitignore file, which tells Heroku which contents are superfluous and should be ignored, was also used in deployment. More information about Heroku with Python can be found at Heroku Dev Center.

Results

The best-performing model had 92.6% accuracy on the test data and 72.8% accuracy on the random subset. It performed most successfully on calls assigned to the NYPD, HBD and DPR.

Loss and Accuracy Curves

Loss and Accuracy

Accuracy and loss curves show that the model began to learn at a mostly steady rate after about 50 epochs.

Mapbox Density Heatmap Example

August

This animation depicts the day-to-day changes in call volume throughout NYC in August 2020. Yellow areas on the map indicate high call volume.

Most Frequent Words in Call Descriptions