seq2seq forecasting

Contains versions of my ongoing experimental work with RNNs on the m5 competition.

Model

2 layer encoder-decoder model with GRU units.
Uses embedding layers for categorical features
Pipeline to incoporate local and global categorial conditioning
Incoporates teacher forcing decay while training to help convergence and test performance

Incoporating all categorical features as input into the encoder leads to overfitting as opposed to using some amount of global conditioning
Modeling seems to be challenging for RNN because of the sporadic count data mixed with continuous count series data
Training batch data is skewed towards low velocity items (sporadic and low magnitude sales)
Model struggles with adjusting to different output scales for different items
Batch training of NNs helps tackle the constraint of internal memory and number of features seen with GBMs.
Method of stationary processing to subtract trendline from timeseries doesnt work as well because of intermittant magnitude of most items
Context-Vector (output hidden of encoder network) doesnt capture seasonality-week over week effects
Differencing timeseries helped prevent overfitting, better validation performance
Attention predictions more stable during training but coverge to the same point
Attention weights with MSE loss is alright but zeros out with Poisson

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
core.py		core.py
dataset.py		dataset.py
readme.md		readme.md
train.py		train.py