allow different user and assistant end-token #375

AndreasMadsen · 2023-07-28T20:46:53Z

For models like Llama2, the EndToken is not the same for a userMessage and an assistantMessage. This implements userMessageEndToken and assistantMessageEndToken which overwrites the messageEndToken behavior.

This PR also allows empty strings as userMessageToken and assistantMessageToken and makes this the default. This adds additional flexibility, which is required in the case of Llama2 where the first userMessage is effectively different because of the system message.

Note that because userMessageEndToken and assistantMessageToken are nearly always concatenated, it is almost redundant to have both. The exception is generateQuery for websearch which have several consecutive user messages.

Documentation for the correct prompting:

With the existing config provided in #361 (comment)

{
    "userMessageToken": "[INST]",
    "assistantMessageToken": "[/INST]",
    "messageEndToken": "</s>",
    "preprompt": "[INST]<<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>\n\n[/INST]"
}

the prompt becomes incorrect:

[INST]<<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>>

[/INST][INST]{{user message 1}}</s>[/INST] {{assistant message 1}}</s>[INST]{{user message 2}}</s>[/INST]

However, with this PR, the following config is possible:

  {
    "userMessageToken": "",
    "userMessageEndToken": " [/INST] ",
    "assistantMessageToken": "",
    "assistantMessageEndToken": " </s><s>[INST] ",
    "preprompt": "<s>[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\n\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n"
}

the prompt has the correct formatting:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, \ndangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{{user message 1}} [/INST] {{assistant message 1}} </s><s>[INST] {{user message 2}} [/INST] {{assistant message 2}} </s><s>[INST] {{user message 3}} [/INST]

For models like Llama2, the EndToken is not the same for a userMessage and an assistantMessage. This implements `userMessageEndToken` and `assistantMessageEndToken` which overwrites the messageEndToken behavior. This PR also allows empty strings as userMessageToken and assistantMessageToken and makes this the default. This adds additional flexibility, which is required in the case of Llama2 where the first userMessage is effectively different because of the system message. Note that because `userMessageEndToken` and `assistantMessageToken` are nearly always concatenated, it is almost redundant to have both. The exception is `generateQuery` for websearch which have several consecutive user messages.

nsarrazin · 2023-08-02T12:20:03Z

Thanks for this PR! I'll check it with our llama 2 endpoint and see if it improves things a bit. This could be super useful 😄

* rm open assistant branding * Update SettingsModal.svelte * make settings work with a dynamic list of models * fixed types --------- Co-authored-by: Nathan Sarrazin <[email protected]>

The chat generation removes parameters.stop and <|endoftext|> from the generated text. And additionally trims trailing whitespace. This PR copies that behavior to the summarize functionality, when the summary is produced by a the chat model.

…user & assistant

nsarrazin · 2023-08-02T13:05:17Z

Seems to work great with llama 2 on prod! I added a fallback, and I'm going to merge this PR. Thanks again for the great work.

AndreasMadsen · 2023-08-02T16:14:02Z

@nsarrazin Thanks for merging this. Please note that the fallback was already implemented in the config parsing:

https://github.com/huggingface/chat-ui/pull/375/files#diff-ac854d5672af10370323e57d41c389221362fd685d219fc5b74c6ff0d23fc6e8R57-R58

Having spent some time on this code, I think the project will generally benefit from moving more logical to the config parsing. Another case is the stop tokens, which have OpenAssistant logic spread throughout the code. This could be consolidated as a default in the config parsing.

nsarrazin · 2023-08-03T09:59:31Z

Oh my bad, good catch! I agree overall, I think the logic in some places is a bit all over the place... I also don't really like the MODELS environment variable because I think it contains too much information and I would like to break it down. If you have any suggestion on that feel free to let me know 😁

AndreasMadsen · 2023-08-03T16:45:45Z

@nsarrazin I think the MODELS structure itself is fine. But I think .env should only be used for actual secrets that one would set with environment variables. Storing a JSON file in an environment variable, is not so elegant, and except for endpoints all the information MODELS contains is already exposed to the public.

My immediate suggestion would be to store all non-secretes that are reasonable to check-in with git, in a separate JSON file. You could have an optional environment variable that points to the JSON file. Because some properties contain newlines, my preference would be a YAML file instead. That would also allow you to document each property with comments and some of the DRY features that YAML offers (like anchors). A developer can then copy that file default config file, and overwrite the properties they wish.

My second advice would be to avoid clever defaults. If you are anyway going to make breaking changes, the don't for example make <|startoftext|> a default stop token and remove the messageEndToken fallback. Such things are very easy to specify explicitly.

Finally, a disclaimer, while I've done Node.js for a fair number of years, I'm really old school. I haven't been following up on the "new" reactive isomorphic trends, so if there is some Vite or svelte system for managing this, I for sure wouldn't know about it. 😅

loganlebanoff · 2023-08-21T13:07:12Z

I think the "userMessageEndToken": " [/INST] ", is incorrect? It has a trailing space, so won't that mess up the llama model from generating text after that?

loganlebanoff · 2023-08-21T13:07:56Z

Unless chat-ui removes the trailing space behind the scenes

AndreasMadsen · 2023-08-21T14:13:14Z

@loubnabnl It looks like you are partially correct. There should be a space after [/INST] for past assistant messages (https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/llama/generation.py#L247C65-L247C74). However, as you mention, there should not be one for the prompted assistant message ( https://github.com/facebookresearch/llama/blob/ea9f33d6d3ea8ed7d560d270986407fd6c2e52b7/llama/generation.py#L262C1-L263C1).

This requires some conditional logic. #400 adds support for that.

nsarrazin self-requested a review August 2, 2023 12:20

flozi00 and others added 4 commits August 2, 2023 14:31

Make model branding customizable based on env var (huggingface#345)

aea94e7

* rm open assistant branding * Update SettingsModal.svelte * make settings work with a dynamic list of models * fixed types --------- Co-authored-by: Nathan Sarrazin <[email protected]>

add a login button when users are logged out (huggingface#381)

085440d

add fallback to message end token if there's no specified tokens for …

e08a50f

…user & assistant

nsarrazin approved these changes Aug 2, 2023

View reviewed changes

nsarrazin merged commit f209301 into huggingface:main Aug 2, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow different user and assistant end-token #375

allow different user and assistant end-token #375

AndreasMadsen commented Jul 28, 2023 •

edited

Loading

nsarrazin commented Aug 2, 2023

nsarrazin commented Aug 2, 2023

AndreasMadsen commented Aug 2, 2023

nsarrazin commented Aug 3, 2023

AndreasMadsen commented Aug 3, 2023 •

edited

Loading

loganlebanoff commented Aug 21, 2023

loganlebanoff commented Aug 21, 2023

AndreasMadsen commented Aug 21, 2023

allow different user and assistant end-token #375

allow different user and assistant end-token #375

Conversation

AndreasMadsen commented Jul 28, 2023 • edited Loading

nsarrazin commented Aug 2, 2023

nsarrazin commented Aug 2, 2023

AndreasMadsen commented Aug 2, 2023

nsarrazin commented Aug 3, 2023

AndreasMadsen commented Aug 3, 2023 • edited Loading

loganlebanoff commented Aug 21, 2023

loganlebanoff commented Aug 21, 2023

AndreasMadsen commented Aug 21, 2023

AndreasMadsen commented Jul 28, 2023 •

edited

Loading

AndreasMadsen commented Aug 3, 2023 •

edited

Loading