Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decode token logprobs #130

Merged
merged 6 commits into from
Feb 6, 2024
Merged

Support decode token logprobs #130

merged 6 commits into from
Feb 6, 2024

Conversation

comaniac
Copy link
Collaborator

@comaniac comaniac commented Feb 2, 2024

Initial support to logprobs of decoding tokens:

  • When return_logprobs is set, the response includes an additional field token_logprobs, which has a list of logprob for each token including prompt tokens. Each element in this list contains (token text, token ID, logprob).
  • When streaming, the token_logprobs behaves as text. In other words, it is monotonic.
  • This PR only supports top 1 logprob.
  • For OpenAI APIs, text_offset and top_logprobs are not supported yet.

Known issue:

  1. The request with regex + streaming + logprobs, the logprobs is not monotonic due to regex's decoding process. Since this kind of requests doesn't make sense, we won't fix this issue for now.
  2. To align with OpenAI logprob style, there are additional overheads such as tokenizer.convert_ids_to_tokens.

cc @merrymercy @Ying1123 @Ja1Zhou

python/sglang/srt/managers/router/model_rpc.py Outdated Show resolved Hide resolved
python/sglang/srt/managers/router/model_rpc.py Outdated Show resolved Hide resolved
@comaniac
Copy link
Collaborator Author

comaniac commented Feb 6, 2024

@merrymercy comments addressed. I've verified the performance of benchmarking Hellaswag. The latencies before and after this PR are almost the same.

@merrymercy merrymercy merged commit a7334ae into main Feb 6, 2024
@merrymercy merrymercy deleted the cody/logprob branch February 6, 2024 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants