Assess data quality

A plug-and-play algorithm that needs as little as possible human intervention to assess data quality.

Assess data quality

Introduction

Data is what guides today's decision-making process, it is at the center of modern institutions. But according to the saying: GIGO (Garbage In Garbage Out), bad data may have detrimental consequences on the company that used it. It is then of crucial order that the data be of the best quality possible. However, the process of cleaning the data usually relies on deterministic rules, which makes it hard, tedious, and time-consuming. Thus AMIES along with the company Foyer proposed a challenge about the automation of the process. These plug-and-play algorithms are the results of our work during the challenge. As we are among the winners of the challenge we decided to publish the code and develop it in future work.

Assess data quality is an open-source Python project which currently collects

A plug-and-play algorithm with several strategies to detect "bad data" in a given data set.

Installation

Two different paths are suggested:

Fork the repository:

This will allow you to interact with the original repository (raise issues, get updates, propose pull requests, etc.) based on the fact that you'll share a common history.

Clone the repository:

Make sure you have git installed on the computer

cd <directory-of-the-choice>
git clone https://github.com/<github-user-name>/assess_data_quality.git

Getting started

test.py that showcases the result of the algorithm applied to the test dataset data.csv is available in the ./test folder. The dataset data.csv (available in the ./test folder) is a subset of a sales of agricultural machinery dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
src		src
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assess data quality

Introduction

Installation

Getting started

About

Releases

Packages

Contributors 3

Languages

dhawat/assess_data_quality

Folders and files

Latest commit

History

Repository files navigation

Assess data quality

Introduction

Installation

Getting started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages