Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow different user and assistant end-token #375

Merged
merged 5 commits into from
Aug 2, 2023

Conversation

AndreasMadsen
Copy link
Contributor

@AndreasMadsen AndreasMadsen commented Jul 28, 2023

For models like Llama2, the EndToken is not the same for a userMessage and an assistantMessage. This implements userMessageEndToken and assistantMessageEndToken which overwrites the messageEndToken behavior.

This PR also allows empty strings as userMessageToken and assistantMessageToken and makes this the default. This adds additional flexibility, which is required in the case of Llama2 where the first userMessage is effectively different because of the system message.

Note that because userMessageEndToken and assistantMessageToken are nearly always concatenated, it is almost redundant to have both. The exception is generateQuery for websearch which have several consecutive user messages.


Documentation for the correct prompting:

With the existing config provided in #361 (comment)

{
    "userMessageToken": "[INST]",
    "assistantMessageToken": "[/INST]",
    "messageEndToken": "</s>",
    "preprompt": "[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>\n\n[/INST]"
}

the prompt becomes incorrect:

[INST]<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[/INST][INST]{{user message 1}}</s>[/INST] {{assistant message 1}}</s>[INST]{{user message 2}}</s>[/INST]

However, with this PR, the following config is possible:

  {
    "userMessageToken": "",
    "userMessageEndToken": " [/INST] ",
    "assistantMessageToken": "",
    "assistantMessageEndToken": " </s><s>[INST] ",
    "preprompt": "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n"
}

the prompt has the correct formatting:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, \ndangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{{user message 1}} [/INST] {{assistant message 1}} </s><s>[INST] {{user message 2}} [/INST] {{assistant message 2}} </s><s>[INST] {{user message 3}} [/INST]

For models like Llama2, the EndToken is not the same for a userMessage
and an assistantMessage. This implements `userMessageEndToken` and
`assistantMessageEndToken` which overwrites the messageEndToken
behavior.

This PR also allows empty strings as userMessageToken and
assistantMessageToken and makes this the default. This adds additional
flexibility, which is required in the case of Llama2 where the first
userMessage is effectively different because of the system message.

Note that because `userMessageEndToken` and `assistantMessageToken` are
nearly always concatenated, it is almost redundant to have both. The
exception is `generateQuery` for websearch which have several
consecutive user messages.
@nsarrazin
Copy link
Collaborator

Thanks for this PR! I'll check it with our llama 2 endpoint and see if it improves things a bit. This could be super useful 😄

@nsarrazin nsarrazin self-requested a review August 2, 2023 12:20
flozi00 and others added 4 commits August 2, 2023 14:31
* rm open assistant branding

* Update SettingsModal.svelte

* make settings  work with a dynamic list of models

* fixed types

---------

Co-authored-by: Nathan Sarrazin <[email protected]>
The chat generation removes parameters.stop and <|endoftext|>
from the generated text. And additionally trims trailing whitespace.

This PR copies that behavior to the summarize functionality, when the
summary is produced by a the chat model.
@nsarrazin
Copy link
Collaborator

Seems to work great with llama 2 on prod! I added a fallback, and I'm going to merge this PR. Thanks again for the great work.

@nsarrazin nsarrazin merged commit f209301 into huggingface:main Aug 2, 2023
2 checks passed
@AndreasMadsen
Copy link
Contributor Author

@nsarrazin Thanks for merging this. Please note that the fallback was already implemented in the config parsing:

https://github.com/huggingface/chat-ui/pull/375/files#diff-ac854d5672af10370323e57d41c389221362fd685d219fc5b74c6ff0d23fc6e8R57-R58

Having spent some time on this code, I think the project will generally benefit from moving more logical to the config parsing. Another case is the stop tokens, which have OpenAssistant logic spread throughout the code. This could be consolidated as a default in the config parsing.

@nsarrazin
Copy link
Collaborator

Oh my bad, good catch! I agree overall, I think the logic in some places is a bit all over the place... I also don't really like the MODELS environment variable because I think it contains too much information and I would like to break it down. If you have any suggestion on that feel free to let me know 😁

@AndreasMadsen
Copy link
Contributor Author

AndreasMadsen commented Aug 3, 2023

@nsarrazin I think the MODELS structure itself is fine. But I think .env should only be used for actual secrets that one would set with environment variables. Storing a JSON file in an environment variable, is not so elegant, and except for endpoints all the information MODELS contains is already exposed to the public.

My immediate suggestion would be to store all non-secretes that are reasonable to check-in with git, in a separate JSON file. You could have an optional environment variable that points to the JSON file. Because some properties contain newlines, my preference would be a YAML file instead. That would also allow you to document each property with comments and some of the DRY features that YAML offers (like anchors). A developer can then copy that file default config file, and overwrite the properties they wish.

My second advice would be to avoid clever defaults. If you are anyway going to make breaking changes, the don't for example make <|startoftext|> a default stop token and remove the messageEndToken fallback. Such things are very easy to specify explicitly.

Finally, a disclaimer, while I've done Node.js for a fair number of years, I'm really old school. I haven't been following up on the "new" reactive isomorphic trends, so if there is some Vite or svelte system for managing this, I for sure wouldn't know about it. 😅

@loganlebanoff
Copy link

I think the "userMessageEndToken": " [/INST] ", is incorrect? It has a trailing space, so won't that mess up the llama model from generating text after that?

@loganlebanoff
Copy link

Unless chat-ui removes the trailing space behind the scenes

@AndreasMadsen
Copy link
Contributor Author

@loubnabnl It looks like you are partially correct. There should be a space after [/INST] for past assistant messages (https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/llama/generation.py#L247C65-L247C74). However, as you mention, there should not be one for the prompted assistant message ( https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/llama/generation.py#L262C1-L263C1).

This requires some conditional logic. #400 adds support for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants