[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

YRFT · 2021-05-21T04:10:35Z

In the "Stateful RNN" section in chapter 16, the code steps_per_epoch = train_size // batch_size // n_steps seems not appropriate.

When we created the datasets, we used different values for size and shift when calling window method (with drop_remainder=True). As a result, the actual value of steps per ephoch is not always equal to train_size // batch_size // n_steps. For example:

n_steps = 100
window_length = n_steps + 1

dataset = tf.data.Dataset.from_tensor_slices(np.arange(200))

dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))

print (len(list(dataset)), 200//n_steps)

will output:

1 2

So, the resetting operations might not execute as expected.

I have created the datasets without calling the repeat method, and commented steps_per_epoch=steps_per_epoch in the fit method.

batch_size = 32
encoded_parts = np.array_split(encoded[:train_size], batch_size)
datasets = []
for encoded_part in encoded_parts:
    dataset = tf.data.Dataset.from_tensor_slices(encoded_part)
    dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_length))
    datasets.append(dataset)
dataset = tf.data.Dataset.zip(tuple(datasets)).map(lambda *windows: tf.stack(windows))
# dataset = dataset.repeat().map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
# steps_per_epoch = train_size // batch_size // n_steps
# history = model.fit(dataset, steps_per_epoch=steps_per_epoch, epochs=50, callbacks=[ResetStatesCallback()])
history = model.fit(dataset, epochs=50, callbacks=[ResetStatesCallback()])

The text was updated successfully, but these errors were encountered:

ageron · 2021-05-26T03:49:35Z

Hi @YRFT ,

Thanks for your great feedback. Indeed, I wasn't very rigorous with the steps_per_epoch calculation. In general, it should have been steps_per_epoch = math.ceil(train_size / batch_size), which can also be computed as steps_per_epoch = (train_size - 1) // batch_size + 1. But in this particular case, the correct value is steps_per_epoch = (train_size // batch_size - 1) // n_steps. However, rather than fixing the steps_per_epoch, it's simpler to just remove repeat() and stop using steps_per_epoch altogether, as you suggested.

I initially used repeat() + steps_per_epoch because there were bugs in TensorFlow which prevented using model.fit(dataset) without specifying steps_per_epoch. And once that was fixed, there was a bug in the shuffling algorithm when using model.fit(dataset, epochs=n_epochs) without steps_per_epoch: the dataset would be shuffled the same way at each epoch. So it was necessary to handle the shuffling directly in the dataset, making repeat() compulsory. But the good news is that these bugs are now fixed, so we can enjoy nice and simple code! :)

Thanks again!

ageron closed this as completed in 361ebf5 May 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

YRFT commented May 21, 2021

ageron commented May 26, 2021

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

Comments

YRFT commented May 21, 2021

ageron commented May 26, 2021