Skip to content

Tensorflow RNN components implemented as simply as possible

License

Notifications You must be signed in to change notification settings

VD44/RNN-Components

Repository files navigation

Better RNN Components

Tensorflow RNN components implemented such that mathematical logic is immediately apparent.

Contents

RNN CELLS

A collection of different variants of RNN, LSTM, and GRU cells.

RNN Cell

Simplest of RNN cells, features a single sigmoid layer at each time step.

LSTM Cell

Basic LSTM cell as described in Long Short-Term Memory with optional dropout. For simplicity purposes, all of the mathematical logic is made clearly visible and is implemented in a minimalistic fashion:

def lstm_cell(x, c, h, units, scope='lstm_cell', 
    w_init=tf.random_normal_initializer(stddev=0.02), 
    b_init=tf.constant_initializer(0),
    f_b=1.0, i_kp=1.0, o_kp=1.0):
    with tf.variable_scope(scope):
        w_dim = shape_list(x)[1] + shape_list(h)[1]
        w = tf.get_variable("w", [w_dim, units * 4], initializer=w_init)
        b = tf.get_variable("b", [units * 4], initializer=b_init)
        x = _rnn_dropout(x, i_kp)
        z = tf.matmul(tf.concat([x, h], 1), w) + b
        i, j, f, o = tf.split(z, 4, 1)
        c = tf.nn.sigmoid(f + f_b) * c + tf.nn.sigmoid(i) * tf.tanh(j)
        h = tf.nn.sigmoid(o) * tf.tanh(c)
        h = _rnn_dropout(h, o_kp)
        return h, c

LSTM Cell with Peepholes

LSTM cell similar to the one above but with added peepholes as described in Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

mLSTM Cell

Multiplicative variant of LSTM as described in Multiplicative LSTM For Sequence Modeling. mLSTM is able to use different recurrent transition functions for every possible input, allowing it to be more expressive for autoregressive sequence modeling. mLSTM outperforms standard LSTM and its deeper variants in many sequence modeling tasks.

mLSTM Cell with Peepholes

Multiplicative LSTM with peepholes, combines the concepts of the above two cells.

mLSTM Cell with L2 regularization

Multiplicative LSTM cell with L2 regularization as described in L2 Regularization for Learning Kernels.

GRU Cell

GRU cell as described in Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. GRU cells have shown similar performance to the more popular LSTM cells despite having less trainable parameters.

RNN Presets

Several preset RNNs to use in your code or as a reference.

Bidirectional LSTM

Encodes inputs in forward and reverse time order using an LSTM cell and then concatenates the resulting outputs and states. The bidirectional LSTM has shown impressive results when used as the first layer in Googles GNMT translation model described in Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.

Bidirectional GRU

Encodes inputs in forward and reverse time order using a GRU cell and then concatenates the resulting outputs and states.

Stacked LSTM

An rnn that encodes input using a stack of LSTM cells with optional residual connections after a specified depth. Googles GNMT translation model features two stacked LSTM's, one further encoding the output of the bidirectional LSTM followed by a second to decode the output of the stacked encoder.

Stacked GRU

An rnn that encodes input using a stack of GRU cells with optional residual connections after a specified depth.

Attention Mechanisms

Typical attention scores:

Luong Attention Mechanism

Luong attention function as described in Effective Approaches to Attention-based Neural Machine Translation. At every decoding step, an attention mechanism produces a probability distribution, allowing the decoder to focus on specific parts of the encoder output with varying levels of "attention" or emphasis. Given "query" h[t] (the decoder cell output at time t) and h[s] (the s'th encoder output) the luong score for h[s] is computed using the below equation after which all of the scores are normalized using a softmax (general score above).

score(h[t], h[s]) = h[t] . W . h[s]

Bahdanau Attention Mechanism

Bahdanau attention function as described in Neural Machine Translation by Jointly Learning to Align and Translate using the "concat" score above as such:

score(h[t], h[s]) = v . tanh(W . concat(h[t], h[s]))

Temporal Attention Mechanism

Temporal attention mechanism as described in A Deep Reinforced Model For Abstractive Summarization. This form of attention has shown impressive results at the task of machine summarization as it decreases the probabilities over portions of the encoder output that have had high probabilities in previous decoding steps therefore reducing excessive repetition in generated sequences.

Decoder Attention Mechanism

Intra-decoder attention mechanism as described in A Deep Reinforced Model For Abstractive Summarization. Reduces repetition in machine generated output sequences.

Self Critical Loss Function

Self critical loss function as described in A Deep Reinforced Model for Abstractive Summarization to reward objective function in addition to typical cross entropy loss used for seq2seq machine learning tasks. Pseudocode for loss below:

ml_losses = cross_entropy(logits, targets)
rl_losses = (metric(sampled_outputs, targets) - metric(greedy_outputs, targets)) * cross_entropy(logits, sample_outputs)
losses = gamma * rl_losses + (1 - gamma) * ml_losses

Papers

Main papers referenced:

About

Tensorflow RNN components implemented as simply as possible

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages