Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

Closed
YRFT opened this issue May 21, 2021 · 1 comment
Closed

[IDEA] Improvement the "Stateful RNN" section in chapter 16. #431

YRFT opened this issue May 21, 2021 · 1 comment

Comments

@YRFT
Copy link

YRFT commented May 21, 2021

In the "Stateful RNN" section in chapter 16, the code steps_per_epoch = train_size // batch_size // n_steps seems not appropriate.

When we created the datasets, we used different values for size and shift when calling window method (with drop_remainder=True). As a result, the actual value of steps per ephoch is not always equal to train_size // batch_size // n_steps. For example:

n_steps = 100
window_length = n_steps + 1

dataset = tf.data.Dataset.from_tensor_slices(np.arange(200))

dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))

print (len(list(dataset)), 200//n_steps)

will output:

1 2

So, the resetting operations might not execute as expected.

I have created the datasets without calling the repeat method, and commented steps_per_epoch=steps_per_epoch in the fit method.

batch_size = 32
encoded_parts = np.array_split(encoded[:train_size], batch_size)
datasets = []
for encoded_part in encoded_parts:
    dataset = tf.data.Dataset.from_tensor_slices(encoded_part)
    dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
    dataset = dataset.flat_map(lambda window: window.batch(window_length))
    datasets.append(dataset)
dataset = tf.data.Dataset.zip(tuple(datasets)).map(lambda *windows: tf.stack(windows))
# dataset = dataset.repeat().map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(
    lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
# steps_per_epoch = train_size // batch_size // n_steps
# history = model.fit(dataset, steps_per_epoch=steps_per_epoch, epochs=50, callbacks=[ResetStatesCallback()])
history = model.fit(dataset, epochs=50, callbacks=[ResetStatesCallback()])
@ageron ageron closed this as completed in 361ebf5 May 26, 2021
@ageron
Copy link
Owner

ageron commented May 26, 2021

Hi @YRFT ,

Thanks for your great feedback. Indeed, I wasn't very rigorous with the steps_per_epoch calculation. In general, it should have been steps_per_epoch = math.ceil(train_size / batch_size), which can also be computed as steps_per_epoch = (train_size - 1) // batch_size + 1. But in this particular case, the correct value is steps_per_epoch = (train_size // batch_size - 1) // n_steps. However, rather than fixing the steps_per_epoch, it's simpler to just remove repeat() and stop using steps_per_epoch altogether, as you suggested.

I initially used repeat() + steps_per_epoch because there were bugs in TensorFlow which prevented using model.fit(dataset) without specifying steps_per_epoch. And once that was fixed, there was a bug in the shuffling algorithm when using model.fit(dataset, epochs=n_epochs) without steps_per_epoch: the dataset would be shuffled the same way at each epoch. So it was necessary to handle the shuffling directly in the dataset, making repeat() compulsory. But the good news is that these bugs are now fixed, so we can enjoy nice and simple code! :)

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants