Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YaRN : correction to GPT-NeoX implementation #4093

Closed
wants to merge 1 commit into from

Conversation

cebtenzzre
Copy link
Collaborator

At one point I was struggling to understand what the Metal kernel was doing with GPT-NeoX RoPE, and I think I got it wrong. I got halfway there - the comment makes it fairly obvious what is going on. But the rotation amount should be an integer and should not be multiplied by inv_ndims - inv_ndims should only be part of theta.

@jquesnelle does this seem like the right thing to do?

I learned from my mistakes, this is running on ggml-ci so I don't have to worry about error-prone manual testing across several machines.

@ggerganov
Copy link
Owner

Is there some way to compare the results with a reference implementation. I'm confused as well at this point

@maddes8cht
Copy link
Contributor

As far as i can see, the only reference implementation of gpt-neox is the pre-gguf implemementation in
https://github.com/ggerganov/ggml/tree/master/examples
Is this still correct?

@cebtenzzre
Copy link
Collaborator Author

As far as i can see, the only reference implementation of gpt-neox is the pre-gguf implemementation in

I believe he means a reference implementation of YaRN for GPT-NeoX... but there is none. There is only one for LLaMA.

@ggerganov
Copy link
Owner

Is this still relevant? I think RoPE is computed correctly across all backends now

@cebtenzzre
Copy link
Collaborator Author

Is this still relevant? I think RoPE is computed correctly across all backends now

The changes in this PR only affect RoPE when using the YaRN scaling options. I believe one should see a perplexity difference between this PR and master while using YaRN with Falcon or any other model using GPT-NeoX RoPE.

@ggerganov
Copy link
Owner

superseded by #7617

@ggerganov ggerganov closed this May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants