Skip to content

Assignment for Data & information quality course attended @ Polimi. Analyzed performances of different imputation techniques.

Notifications You must be signed in to change notification settings

VladMarianCimpeanu/DIQ-assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DIQ-assignment

This repository contains our assignment for the Data & information quality course held at Politecnico di Milano (accademic year: 2022/2023).

In this assignment we try different imputation approaches to deal with missing values in a dataset. We are provided with two complete datasets, and we inject several missing values though this script, generating for each original dataset 5 new versions (50%, 60%, 70%, 80% and 90% of completeness).

For the evaluation we consider how accurate is the reconstruction of a given imputation method. First we try to compute the accuracy using the exact matching approach, then, since there are some features for which there exist a similarity measure, we try to compute the reconstruction accuracy by using a similarity-based approach.

As final evaluation we select two Machine learning algorithms and perform a classification task using the different datasets. In this scenario we are able to understand how an imputation method, and the completeness level can impact machine learning performances.

About

Assignment for Data & information quality course attended @ Polimi. Analyzed performances of different imputation techniques.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages