-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mpt : fix n_ctx
#165
mpt : fix n_ctx
#165
Conversation
Hm, I'm confused. It looks like https://huggingface.co/mosaicml/mpt-7b/blob/main/config.json#L39 Need to download the model and test it |
Yeah I also think |
Should I change |
The main reason was that we always allocate max seq length which might require too much ram for kv cache on low-ram machines |
Ah yes, I remember now. Let me think about this some more then. |
Ok, but the way it is implemented now, we allocate with Line 21 in c2fab8a
This is 2x more than |
It was mainly intended as a fix for running the storywriter model which has a much higher seq length |
Should we do something like |
I suggest adding a command line parameter for setting n_ctx. Default value could be 512 as in llama. Maximum value could be max_seq_len. |
Yes, we can do both of these suggestions |
n_ctx
to mpt_hparams
n_ctx
Added the I can add the command line parameter later in a separate PR. Can it be added to all models not just MPT? |
Yes, I like the suggestion, many new models don’t have an inherent max seq length either, thus setting it to that was a bit arbitrary in the first place. |
Also
max_seq_len
doesn't seem to be used anywhere. Is that expected?