Skip to content

Handling numerical missing data using, interpolation, spline interpolation, simple imputer, etc on weather data

Notifications You must be signed in to change notification settings

RJ-NPN/Data-Engineering

Repository files navigation

  1. Principle component analysis(PCA) for dimentionality reduction.
  2. MissingData

PCA:

Principal Component Analysis (PCA) is a technique used to reduce the dimensionality of data while preserving its variance. It does this by transforming the original variables into a new set of uncorrelated variables called principal components. This helps in reducing the execution time and resources used(Although may hinder accuracy - tradeoff). Following are the results:

Without PCA reduction: CPU times: user 48.1 s, sys: 9.73 s, total: 57.8 s Wall time: 17 s Knn score: 0.9705

With PCA reduction:

CPU times: user 1.1 s, sys: 1.47 s, total: 2.56 s Wall time: 664 ms Knn score: 0.9246

MissingData:

Handling numerical missing data using interpolation, spline interpolation, simple imputer, etc

Below is Comparision of fillled NAN values using interpolator, spline interpolator(kind: curve and linear respectively):

Comparision of fillled NAN values using interpolator, spline interpolator(kind: curve and linear respectively)

About

Handling numerical missing data using, interpolation, spline interpolation, simple imputer, etc on weather data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages