-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT-2 pretrain loss. #313
Comments
No, the aggregate of all output logits loss is not the overall loss. The loss function is usually defined in GPT-2 and other neural network models to calculate the difference between the goal output and the predicted output. Cross-entropy loss or mean squared error metrics are frequently used to quantify this difference. |
Hello, Thanks for your great work. I want to know how to calculate the loss given the raw text. For example:
I have a sample in training data: " I want to go to school". When I input the string into the GPT-2 model, every output logits has a loss value. So the total loss is the sum of all output logits loss?
The text was updated successfully, but these errors were encountered: