Disaster Response Pipeline Project

Project Overview

The purpose of this project is to apply natural language techniques and machine learning in order to classify disaster messages (in English). Furthermore, there is an API available in which you can enter new messages and automatically receive a classification for them. The model differentiates between 36 categories e.g. “weather”, “water”, “food” etc.

Installation

The code was tested using Python version 3.9. For other necessary libraries please use requirements.txt

pip install -r requirements.txt

Instructions:

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
  1. In case you wish to tune the parameter (GridSearchCV) python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl True
  2. Otherwise, the model will take for training the optimized parameter python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl False
Run the following command in the app's directory to run your web app. python run.py
Go to http:https://0.0.0.0:3001/

There you can find the different data visualization for better understanding the data set as well as you can put your message (in English) and receive as a response the classification:

1. Visualization

2. Classification

File Descriptions

data

a) /process_data.py: ETL pipeline, clean, preprocess data and store it into SQLite database

b) /disaster_messages.csv: real messages that were sent during disaster event

c) /disaster_categories.csv: 36 possible categories
model

a) /train_classifier.py: takes data from database, creates and trains, tunes classifier and at the end saves data to pickle file

b) /best_params.pkl : contains tuned parameter
app

a) /run.py: runs REST API for visualizations and message classification

b) /wrangling_script/ wrangle_data.py: data wrangling as well as data visualization

c) /templates: all files needed for frontend

Discussion

Due to the fact, that “Disaster Response Project” is a supervised learning with 36 pre-defined categories we are dealing with classification task. In this project K-nearest neighbors will be used as classification algorithm.

K-nearest neighbors belongs to the type of a lazy learner that means that this algorithm doesn’t learn a discriminative function from the training set instead it memorizes the training data. The major advantage of this approach on the one hand is the instant adoption of new data points. On the other hand, this approach requires high computational cost especially for classifying new samples (the computation complexity grows linearly with the number of samples in the data set). In addition, K-nearest neighbors demands high storage space since the training samples can’t be discard (training isn’t a part of the approach).

The main idea of the K-nearest neighbors is:

Define k and a distance metric
Find the k-nearest neighbors for new sample
Classify the sample by majority vote

For this reason, there are two parameters that are crucial to balance between overfitting and underfitting: the number of k and the distance metric. Therefore, these two parameters are used for tuning.

The most popular distance metrics are:

Euclidean Distance: the straight line between two data points in Euclidean space

$\sqrt{x_2-x_1)^2+(y_2-y_1)^2}$
Manhattan Distance: the distance between two points or vectors A and B is defined as the sum of the absolute differences of their individual coordinates

$\sum_{i}|A_i-B_i|$
Minkowski Distance: a generalization of the Euclidean and Manhattan distance

$(\sum_{i=1}^{n}|x_i-y_i|^p)^ \frac{1}{p}$

Licensing, Authors, Acknowledgements

Must give credit to Figure Eight that provided the dataset with real messages and labels. Great thanks to Udacity for their contribution during the process.

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

DISCLAIMED ARE ANY REPRESENTATIONS OR WARRANTIES OF ANY KIND, WHETHER ORAL OR WRITTEN, WHETHER EXPRESS, IMPLIED, OR ARISING BY STATUTE, CUSTOM, COURSE OF DEALING, OR TRADE USAGE, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.

IN NO EVENT WILL THE COPYRIGHT OWNER, CONTRIBUTORS, LICENSORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION). HOWEVER, CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THE CODE COMPONENTS, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
data		data
images		images
models		models
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Project Overview

Installation

Instructions:

1. Visualization

2. Classification

File Descriptions

Discussion

Licensing, Authors, Acknowledgements

Copyright (C) 2021 June

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

About

Releases

Packages

Languages

LN5user/disaster-response

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Table of Contents

Project Overview

Installation

Instructions:

1. Visualization

2. Classification

File Descriptions

Discussion

Licensing, Authors, Acknowledgements

Copyright (C) 2021 June

TO THE FULLEST EXTENT PERMITTED UNDER APPLICABLE LAW, THE CODE COMPONENTS ARE PROVIDED BY THE AUTHORS, COPYRIGHT HOLDERS, CONTRIBUTORS, LICENSORS, “AS IS”.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages