This repository contains my work for the ESDA NILM (Non-Intrusive Load Monitoring) 2022 Kaggle competition, where the objective was to classify appliances based on energy consumption data. The project showcases a blend of data cleaning, exploratory analysis, feature engineering, and machine learning using XGBoost. My final submission achieved a Kaggle score of 0.96194, potentially placing first in the competition (official first place score: 0.91).
Link to the Kaggle Competition: https://www.kaggle.com/competitions/esda-nilm-2022
environment.yml
: Defines the project's Python environment and dependencies.eda.py
: Script for exploratory data analysis and initial data cleaning.build_features.py
&build_features_v2.py
: Scripts for feature engineering, including handling multicollinearity,outlier removal, and temporal data utilization.train_predict_model.py
&train_predict_model_v2.py
: Model training and prediction scripts using XGBoost, with different iterations and parameter tuning.
- Rigorous exploratory analysis and cleaning were pivotal in understanding the dataset.
- Feature engineering focused on reducing multicollinearity and leveraging timestamp data.
- Model tuning in XGBoost played a crucial role in improving the prediction accuracy from 0.82 to 0.96 Kaggle score.
To replicate the analysis or use the scripts, clone the repository and set up the environment using the provided environment.yml
file. Each Python script is documented for ease of understanding and modification.
For detailed analysis and results, please refer to individual scripts and comments within the repository.