Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpt : fix n_ctx #165

Closed
wants to merge 6 commits into from
Closed

mpt : fix n_ctx #165

wants to merge 6 commits into from

Conversation

marella
Copy link
Contributor

@marella marella commented May 18, 2023

Also max_seq_len doesn't seem to be used anywhere. Is that expected?

@ggerganov
Copy link
Owner

Hm, I'm confused. It looks like max_seq_len corresponds to n_ctx.
Looking at HF repo, it seems that it should be 2048, but here we have it defined as 4096:

https://huggingface.co/mosaicml/mpt-7b/blob/main/config.json#L39

Need to download the model and test it

@marella
Copy link
Contributor Author

marella commented May 20, 2023

Yeah I also think max_seq_len should be n_ctx.
I also mentioned it to the author who created it in #168 (comment)

@marella
Copy link
Contributor Author

marella commented May 20, 2023

Should I change max_seq_len to n_ctx? Replit example is using max_seq_len as n_ctx everywhere.

@lukasmoellerch
Copy link
Contributor

The main reason was that we always allocate max seq length which might require too much ram for kv cache on low-ram machines

@ggerganov
Copy link
Owner

Ah yes, I remember now. Let me think about this some more then.

@ggerganov ggerganov self-requested a review May 20, 2023 15:48
@ggerganov
Copy link
Owner

ggerganov commented May 20, 2023

The main reason was that we always allocate max seq length which might require too much ram for kv cache on low-ram machines

Ok, but the way it is implemented now, we allocate with n_ctx = 4096:

int n_ctx = 4096;

This is 2x more than max_seq_len.
What is the reasoning for line 21 ?

@lukasmoellerch
Copy link
Contributor

The main reason was that we always allocate max seq length which might require too much ram for kv cache on low-ram machines

Ok, but the way it is implemented now, we allocate with n_ctx = 4096:

int n_ctx = 4096;

This is 2x more than max_seq_len. What is the reasoning for line 21 ?

It was mainly intended as a fix for running the storywriter model which has a much higher seq length

@marella
Copy link
Contributor Author

marella commented May 21, 2023

Should we do something like n_ctx = min(max_seq_len, 4096)?

@klosax
Copy link
Contributor

klosax commented May 21, 2023

I suggest adding a command line parameter for setting n_ctx. Default value could be 512 as in llama. Maximum value could be max_seq_len.

@ggerganov
Copy link
Owner

Yes, we can do both of these suggestions

@marella marella changed the title mpt : move global variable n_ctx to mpt_hparams mpt : fix n_ctx May 22, 2023
@marella
Copy link
Contributor Author

marella commented May 22, 2023

Added the min() condition.

I can add the command line parameter later in a separate PR. Can it be added to all models not just MPT?

@lukasmoellerch
Copy link
Contributor

I suggest adding a command line parameter for setting n_ctx. Default value could be 512 as in llama. Maximum value could be max_seq_len.

Yes, I like the suggestion, many new models don’t have an inherent max seq length either, thus setting it to that was a bit arbitrary in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants