Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong rotary embedding result between transformers structure and Megatron structure #873

Closed
GGGGGGXY opened this issue Apr 5, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@GGGGGGXY
Copy link

GGGGGGXY commented Apr 5, 2023

Describe the bug

In FP16 inference:
In transformers
image
cos_cached and sin_cached is a FloatTensor very beginning. And they will not convert to HalfTensor.

But in Megatron
image
cos and sin recompute into HalfTensor. And it became a bias between transformers and Megatron.
Sometimes we get total different logits.

@GGGGGGXY GGGGGGXY added the bug Something isn't working label Apr 5, 2023
@StellaAthena
Copy link
Member

@GGGGGGXY Why do you think that their way of doing this is more desirable than ours? Ours is the first implementation to support mixed precision training and so arguably is "canonical"... does it introduce NaNs in fp16 or something?

@GGGGGGXY
Copy link
Author

GGGGGGXY commented Apr 18, 2023

@GGGGGGXY Why do you think that their way of doing this is more desirable than ours? Ours is the first implementation to support mixed precision training and so arguably is "canonical"... does it introduce NaNs in fp16 or something?

Sorry, I think the way in EleutherAI/gpt-neox if more desirable. I just want to show the different between transformers and Megatron. Most of the users using transformers on downstream task. And this difference produce completely different result in some way。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants