Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training problem #15

Open
DrYangLiu opened this issue Aug 5, 2019 · 1 comment
Open

Training problem #15

DrYangLiu opened this issue Aug 5, 2019 · 1 comment

Comments

@DrYangLiu
Copy link

@ConnorJL Thanks for the great work.

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?

@ConnorJL
Copy link
Owner

ConnorJL commented Aug 5, 2019

Unfortunately, this is a known phenomena, and I haven't been able to fix it. I perform the shifting of the labels in the input function (it's done in an ugly way, I'd do it differently now, but the effect should be the same). If I didn't shift, the model should converge to 0 loss very rapidly since it's just copying the input. I'm very open to any other ideas of what may be causing this problem. Maybe it is the dataset after all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants