ICMS Prediction for Rio de Janeiro

This repository contains the code and data used in the article "Estudo de Modelos para a Previsão de Arrecadação do ICMS do Rio de Janeiro" by João Pedro Verçosa. The study explores various machine learning models to forecast the ICMS revenue in Rio de Janeiro, using data from 2004 to 2022.

Introduction

The purpose of this study is to analyze and predict the ICMS revenue in Rio de Janeiro using advanced machine learning techniques. The models used include Random Forest, XGBoost, and Long Short-Term Memory (LSTM) neural networks. The study aims to provide more accurate forecasting to support government planning and decision-making.

Data

The dataset includes time series data from various economic and social indicators, that were compared with the ICMS time series using DTW (Dynamic Time Warping notebook) technique. They were collected from open data sources such as the Portal de Dados Abertos¹, SGS-Sistema Gerenciador de Séries Temporais², and the Empresa de Pesquisa Energética³. The data used in this study is available in the data directory of this repository.

Models

The models explored in this study are:

Random Forest
XGBoost
LSTM Neural Networks

Each model is trained and evaluated using a set of parameters optimized through Grid Search. Details on the parameter values and optimization process are provided in the article and the accompanying Jupyter notebooks in this repository.

Usage

To run the code, you need to have Python installed with the required libraries. You can install the dependencies using the provided requirements.txt file.

pip install -r requirements.txt

Running the Models

Each model has its own Jupyter notebook in the notebooks directory:

You can open and run these notebooks to reproduce the results of the study. The notebooks include all the steps from data preprocessing, model training, and evaluation.

Data Preprocessing

The data preprocessing steps are included in the preprocessing.ipynb script. This script normalizes the data and prepares it for model training. Run the preprocessing.ipynb to get everything ready.

Results

The study found that the Random Forest and XGBoost models performed better than the LSTM model in terms of predictive accuracy. The best performance was achieved using a multivariate approach with ICMS and total oil production series, yielding a Mean Absolute Percentage Error (MAPE) of 10.01% over a 12-month forecast horizon.

For more detailed information, please refer to the full article (in Portuguese) that can be found here: Estudo de Modelos para aPrevisão de Arracadação do ICMS do Rio de Janeiro.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
imgs		imgs
notebooks		notebooks
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICMS Prediction for Rio de Janeiro

Table of Contents

Introduction

Data

Models

Usage

Running the Models

Data Preprocessing

Results

About

Releases

Packages

Languages

License

JPVercosa/icms-prediction

Folders and files

Latest commit

History

Repository files navigation

ICMS Prediction for Rio de Janeiro

Table of Contents

Introduction

Data

Models

Usage

Running the Models

Data Preprocessing

Results

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages