Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support v1/chat/completions #50

Merged
merged 6 commits into from
Jan 19, 2024
Merged

Support v1/chat/completions #50

merged 6 commits into from
Jan 19, 2024

Conversation

comaniac
Copy link
Collaborator

Close #26

  • Support OpenAI compatible chat API with and without streaming.
  • Support builtin and dynamic registered chat templates.

@merrymercy
Copy link
Contributor

For the chat template, can we use the hugging face tokenizer by default? https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates

@comaniac
Copy link
Collaborator Author

For the chat template, can we use the hugging face tokenizer by default? https://huggingface.co/docs/transformers/main/en/chat_templating#how-do-i-use-chat-templates

Good idea. Let me add this.

@comaniac
Copy link
Collaborator Author

comaniac commented Jan 19, 2024

Now if --chat-template is not specified, we use the tokenizer build-in chat template. Thus, most users should not worry about the chat template for HF models. Meanwhile, I found that some tokenizers such as TinyLlama does not have chat template, so we will get the following when calling apply_chat_template(messages, tokenize=False, add_generation_prompt=True):

No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model
, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

'<s>[INST] <<SYS>>\nYou are a helpful AI assistant\n<</SYS>>\n\nList 3 countries and their capitals. [/INST]'

And this template is actually incorrect for this model, so you will get the following response in the unit test:

 <<SYS>>
You are a helpful AI assistant
<</SYS>>

List 4 cities with more than 100,000 people. [/INST] <<SYS>>
You are a helpful AI assistant
<</SYS>>

List 5

The following response is expected with ChatML template:

Here are three countries and their capitals:

1. Canada - Ottawa
2. Australia - Canberra
3. South Korea - Seoul

Another issue is apply_chat_template doesn't provide stop strings, so the response may be undesired. For example, Llama-2 template should have stop string "[INST]", "[/INST]", "<<SYS>>", "<</SYS>>". In general, I would still suggest specify chat template explicitly, but maybe we could put this issue in the troubleshooting.

@@ -0,0 +1,381 @@
# Adapted from
# https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider importing instead of copying later.

python/sglang/srt/server_args.py Outdated Show resolved Hide resolved
@merrymercy merrymercy merged commit 23471f9 into main Jan 19, 2024
@merrymercy merrymercy deleted the cody/openai-chat branch January 19, 2024 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OpenAI Chat Completion Endpoint
2 participants