This repository contains the code and methodology used for forecasting sales of product families at Favorita stores located in Ecuador. Our team participated in a Kaggle competition and achieved 9th place using two ensembles of LightGBM (LGBM) models and the darts
library for time series forecasting.
The dataset used in this project is provided by Corporación Favorita through Kaggle. It includes daily sales data for a variety of product families across multiple stores in Ecuador.
We employed the darts
library and LightGBM (LGBM) models for time series forecasting. The key steps in our methodology include:
- Data Preprocessing: Handling missing values, outlier detection, and feature engineering.
- Model Selection: Using two ensembles of four LGBMs each:
- One ensemble trained on the full dataset.
- Another ensemble trained on data from 2015 onwards.
- Training and Validation: Splitting the data into training and validation sets, and tuning the models for optimal performance.
- Evaluation: Using RMSLE to evaluate model performance.