MAHA

MAHA is an in-progress ETL package which uses machine learning to clean your dataset with one line command. Features of MAHA include :-

Drop all the index columns
Drop columns with too many missing values
Using Regression to find the missing values in the data and then replacing them

Prerequisites

Data is in pandas DataFrame format
All the categorical variables are label encoded
All the columns are in the desired data type of the output

You can also:

Find the mean and mode of every column
Fill the NA values with mean and mode of the columnns depending on the datatype
Find a model for every column with all other columns being the independent variables

Dependencies

MAHA uses a number of open source projects to work properly:

NumPy - NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
Pandas - Pandas is a software library written for the Python programming language for data manipulation and analysis.
Sklearn - Machine Learning library which includes various classification, regression and clustering algorithms

Installation

MAHA requires pandas, numpy and sklearn

Use pip to install the packages

$ pip3 install pandas

$ pip3 install numpy

$ pip3 install sklearn

If you have not installed pip, you can do it by

$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Then run the following command where you have installed get-pip.py

$ python get-pip.py

Development

Developed By :- Mithesh R, Arth Akhouri, Heetansh Jhaveri, Ayaan Khan

Want to contribute? Navigate to our GitHub for more information GitHub Repository - MAHA

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
MAHA		MAHA
LICENSE.txt		LICENSE.txt
MAHA.html		MAHA.html
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAHA

Prerequisites

Dependencies

Installation

Development

License

About

Releases 6

Packages

Languages

License

FlintyTub49/MAHA

Folders and files

Latest commit

History

Repository files navigation

MAHA

Prerequisites

Dependencies

Installation

Development

License

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages