Adversarial Training Attentive LSTM

This model can be used for Binary Classification tasks and will be used below as a stock movement classifier (UP/DOWN).

Table of content

Getting started
- Description
- Dependencies
How to use the Adv-ASLTM model?
Reproduce Results from paper (Coming soon)

Getting started

Description

The project is all about reproducing a working version of Adversarial Attention Based LSTM for TensorFlow 2. This new version is available in AdvALSTM.py with the object AdvLSTM. More details of my version below.

I finally updated the author's code for it to run with TensorFlow 2.x and compare my results with his. The updated code is available in the folder original_code_updated .

│   AdvALSTM.py
│   preprocessing.py
│   replicate_result.py
├───data
│   └───stocknet-dataset
│       └───price
│           │   trading_dates.csv
│           ├───ourpped/
│           └───raw/
└───original_code_updated
        evaluator.py
        load.py
        pred_lstm.py
        __init__.py

Dependancies

TensorFlow : 2.9.2

How to use the Adv-ASLTM model?

Installation

Download the AdvASLTM.py file and place it in your project folder.

from AdvLSTM import AdvLSTM

Use

To create a AdvLSTM model, use :

model = AdvLSTM(
  units, 
  epsilon, 
  beta, 
  learning_rate = 1E-2, 
  dropout = None, 
  l2 = None, 
  attention = True, 
  hinge = True, 
  adversarial_training = True, 
  random_perturbations = False)

The AdvLSTM object is a subclass of tf.keras.Model. So you can easily train it as you would normally do with a TensorFlow 2 model :

model.fit(
  X_train, y_train, 
  validation_data = (X_validation, y_validation),
  epochs = 200, 
  batch_size = 1024
  )

The model only accepts:

y : labelled as binary classes (0 or 1), even when using Hinge loss !

(nb_sequences, )

x : sequences of length T, with n features.

(nb_sequences, T, n)

Documentation

class AdvALSTM.AdvALSTM(**params):
__init__(self, units, epsilon = 1E-3, beta =  5E-2, learning_rate = 1E-2, dropout = None, l2 = None, attention = True, hinge = True, adversarial_training = True, random_perturbations = False)

units : int (required)

Specify the number of units of the layers (Dense, LSTM and Temporal Attention) contained in the model.
epsilon : float (optional, default : 1E-3)
beta : float (optional, default : 5E-2). If adversarial_training = True : Epsilon and Beta are used in the adversarial loss. Epsilon define the l2 norm of the perturbations that used to generate the Adversarial Examples :

Beta is then used to weight the Adversarial loss generated with the Adversarial example following the formula below :
learning_rate : float (optional, default : 1E-2). Define the learning rate used to initialize the Adam optimizer used for training.
dropout : float (optional, default : 0.0).
l2 : float (optional, default : 1E-2). Define l2 regularization parameter
attention : boolean (optional, default : True). If True, the model will use the TemporalAttention layer after the LSTM layer to generate the Latent Space representation. If False, the model will take the last hidden state of the LSTM (with return_sequences = False)
hinge : boolean (optional, default : True). If True, the model will uses the Hinge loss to perform training. If False, the model use the Binary Cross-Entropy loss.
adversarial_training : boolean (optional, default : True). If True, the model will generate an Adversarial Loss from the Adversarial Example that will be added to the global loss. If False, the model will be training without adversarial example and loss.
random_perturbations : boolean (optional, default : False). Define how the perturbations are created. If False (default), the perturbations are generated following the paper guidelines with : g is computed with tape.gradient(loss(y, y_pred), e). If True, the perturbations are randomly generated instead of being gradient oriented. g is computed with tf.random.normal(...)

Model description

The Adversarial Attentive LSTM is based on an Attentive LSTM is used to generate a latent space vector that is used as a 1D-representation of a 2D-input sequence (here, the last T technical indicators of a given stock).

This Attentive LSTM use a Temporal Attention Layer that "summarize" the hidden states of the LSTM following the temporal importance detected by the Neural Network. This layer keeps the last hidden states and append it to the attentive output.

Following the Attentive LSTM, we get $e_^{s}$ the latent space representation of the input sequence.

We pass it through the classifier to get $\hat{y}_^{s}$ which it then used to calculate the first loss.

This first loss is derived with respect to $e_^{s}$. It gives the "direction" to follow to maximize the loss by adding perturbations. We use this derivative to calculate $e_{adv}^{s}$, the Adversarial Example :

This Adversarial Example is then passed to the classifier to receive a second loss (Adversarial Loss) as below :

With $\beta$ used to weight the adversarial loss.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data/stocknet-dataset/price		data/stocknet-dataset/price
original_code_updated		original_code_updated
readme_images		readme_images
.gitignore		.gitignore
AdvALSTM.py		AdvALSTM.py
README.md		README.md
preprocessing.py		preprocessing.py
replicate_result.py		replicate_result.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Training Attentive LSTM

Table of content

Getting started

Description

Dependancies

How to use the Adv-ASLTM model?

Installation

Use

Documentation

Model description

About

Releases

Packages

Languages

ClementPerroud/Adv-ALSTM

Folders and files

Latest commit

History

Repository files navigation

Adversarial Training Attentive LSTM

Table of content

Getting started

Description

Dependancies

How to use the Adv-ASLTM model?

Installation

Use

Documentation

Model description

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages