Skip to content

dhawat/assess_data_quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Assess data quality

A plug-and-play algorithm that needs as little as possible human intervention to assess data quality.

Introduction

Data is what guides today's decision-making process, it is at the center of modern institutions. But according to the saying: GIGO (Garbage In Garbage Out), bad data may have detrimental consequences on the company that used it. It is then of crucial order that the data be of the best quality possible. However, the process of cleaning the data usually relies on deterministic rules, which makes it hard, tedious, and time-consuming. Thus AMIES along with the company Foyer proposed a challenge about the automation of the process. These plug-and-play algorithms are the results of our work during the challenge. As we are among the winners of the challenge we decided to publish the code and develop it in future work.

Assess data quality is an open-source Python project which currently collects

  • A plug-and-play algorithm with several strategies to detect "bad data" in a given data set.

Installation

Two different paths are suggested:

  • Fork the repository:

    This will allow you to interact with the original repository (raise issues, get updates, propose pull requests, etc.) based on the fact that you'll share a common history.

  • Clone the repository:

    Make sure you have git installed on the computer

    cd <directory-of-the-choice>
    git clone https://github.com/<github-user-name>/assess_data_quality.git

Getting started

test.py that showcases the result of the algorithm applied to the test dataset data.csv is available in the ./test folder. The dataset data.csv (available in the ./test folder) is a subset of a sales of agricultural machinery dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages