-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong rotary embedding result between transformers structure and Megatron structure #873
Comments
@GGGGGGXY Why do you think that their way of doing this is more desirable than ours? Ours is the first implementation to support mixed precision training and so arguably is "canonical"... does it introduce NaNs in fp16 or something? |
Sorry, I think the way in EleutherAI/gpt-neox if more desirable. I just want to show the different between transformers and Megatron. Most of the users using transformers on downstream task. And this difference produce completely different result in some way。 |
Describe the bug
In FP16 inference:
![image](https://user-images.githubusercontent.com/15215819/229959689-36211114-9c68-4fe6-825b-e3a2e750163a.png)
In transformers
cos_cached and sin_cached is a FloatTensor very beginning. And they will not convert to HalfTensor.
But in Megatron
![image](https://user-images.githubusercontent.com/15215819/229959907-40955d57-49b0-4271-98db-311c9c32d479.png)
cos and sin recompute into HalfTensor. And it became a bias between transformers and Megatron.
Sometimes we get total different logits.
The text was updated successfully, but these errors were encountered: