Wrong rotary embedding result between transformers structure and Megatron structure #873

GGGGGGXY · 2023-04-05T01:47:33Z

Describe the bug

In FP16 inference:
In transformers

cos_cached and sin_cached is a FloatTensor very beginning. And they will not convert to HalfTensor.

But in Megatron

cos and sin recompute into HalfTensor. And it became a bias between transformers and Megatron.
Sometimes we get total different logits.

StellaAthena · 2023-04-17T22:28:56Z

@GGGGGGXY Why do you think that their way of doing this is more desirable than ours? Ours is the first implementation to support mixed precision training and so arguably is "canonical"... does it introduce NaNs in fp16 or something?

GGGGGGXY · 2023-04-18T02:26:01Z

@GGGGGGXY Why do you think that their way of doing this is more desirable than ours? Ours is the first implementation to support mixed precision training and so arguably is "canonical"... does it introduce NaNs in fp16 or something?

Sorry, I think the way in EleutherAI/gpt-neox if more desirable. I just want to show the different between transformers and Megatron. Most of the users using transformers on downstream task. And this difference produce completely different result in some way。

GGGGGGXY added the bug Something isn't working label Apr 5, 2023

Quentin-Anthony assigned zphang and StellaAthena Apr 5, 2023

GGGGGGXY closed this as completed Apr 18, 2023

butsugiri mentioned this issue May 29, 2023

Fix floating point precision issue for RoPE huggingface/transformers#23837

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong rotary embedding result between transformers structure and Megatron structure #873

Wrong rotary embedding result between transformers structure and Megatron structure #873

GGGGGGXY commented Apr 5, 2023 •

edited by StellaAthena

Loading

StellaAthena commented Apr 17, 2023

GGGGGGXY commented Apr 18, 2023 •

edited

Loading

Wrong rotary embedding result between transformers structure and Megatron structure #873

Wrong rotary embedding result between transformers structure and Megatron structure #873

Comments

GGGGGGXY commented Apr 5, 2023 • edited by StellaAthena Loading

StellaAthena commented Apr 17, 2023

GGGGGGXY commented Apr 18, 2023 • edited Loading

GGGGGGXY commented Apr 5, 2023 •

edited by StellaAthena

Loading

GGGGGGXY commented Apr 18, 2023 •

edited

Loading