Skip to content

two-inc/zenml-competition

Repository files navigation

⛩️ Detecting Fraudulent Financial Transactions with ZenML

This repository contains Two's Solution to the ZenML Month of MLOps Competition.

The aim of this project is to develop a production-ready ML application for fraud detection using the ZenML MLOps framework. To train our fraud detection model, we make use of the "Synthetic data from a financial payment system" Dataset available on Kaggle.

📝 Solution Overview

This repository contains an end-to-end ML solution using ZenML, which covers the following responsiblities:

  • Importing the Dataset
  • Cleaning the data & engineering informative features
  • Detecting data drift of new data
  • Training a model to detect fraud on a transactional level
  • Evaluating the performance of the model
  • Deploying the model to a REST API endpoint
  • Providing an interface for users to interact with the model

To address these requirements, we built a Training Pipeline, which we used for experimentation, and a Continuous Deployment Pipeline, which extended the capabilities of the Training Pipeline to identify data drift in new data, train a model on all available data, and evaluate the performance of this model prior to deploying this to an API endpoint.

To enable the aforementioned pipelines, we made use of the following ZenML Stack:

Artifact Storage: Google Cloud Storage

Container Registry: Google Cloud Container Registry

Data Validator: EvidentlyAI

Experiment Tracker: MLFlow

Orchestrator: Google Kuberenetes Engine

Model Deployer: Seldon

🔧 Usage

There are a number of ways of interacting with the code in this repository:

  1. Executing the Training & Continuous Deployment Pipelines
  2. Running the Streamlit App
  3. Running the Tests
Executing the Training & Continuous Deployment Pipelines
  1. Ensure you have Python 3.9 installed on your machine

  2. Install the development requirements:

~ $ pip install -r test-requirements.txt
  1. Deploy and register the ZenML stack described in the Solution Overview

  2. Create an .env file from the .env.example template

  3. To execute the train pipeline:

~ $ python src/run_train_pipeline.py
  1. To execute the deployment pipeline:
~ $ python src/run_deployment_pipeline.py
Running the Streamlit App

The Streamlit application entrypoint is the app.py file at the root of the repository. We have deployed this app to Streamlit Cloud.

To recreate the app on your local machine, you must:

  1. Ensure you have Python 3.9 installed on your machine

  2. Install the Streamlit requirements:

~ $ pip install -r requirements.txt
  1. Create an .env file according to the .env.example template

  2. Deploy the Streamlit application

~ $ streamlit run app.py
Running the Tests
  1. Ensure you have Python 3.9 installed on your machine

  2. Install the test requirements:

~ $ pip install -r test-requirements.txt
  1. Execute tests using pytest
~ $ pytest

📁 Repository Structure

├── .github				<- CI Pipeline Definition
├── src
│   ├── pipelines			<- Pipeline Definition
│   │   ├── ...
│   ├── steps		  		<- Step Definitons
│   │   ├── ...
│   ├── util		 		<- Utility Definitions
│   │   ├── ...
│   ├── data_exploration.ipynb		<- Data Exploration Notebook
│   ├── feature_engineering.ipynb	<- Feature Engineering Experimentation Notebook
│   ├── run_deployment_pipeline.py	<- Deployment Pipeline Execution script
│   ├── run_train_pipeline.py		<- Training Pipeline Execution Script
├── tests
│   ├── util				<- Utility Function Tests
│   │   ├── ...
├── app.py 	   			<- Streamlit App
├── docker-requirements.txt 		<- Step Container Dependencies
├── notebook-requirements.txt 		<- Notebook Dependencies
├── requirements.txt   			<- Streamlit App Dependencies
├── test-requirements.txt 		<- Development Dependencies

🧑‍💻 Competition Participants

About

Two's ZenML Competition Solution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •