Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logprobs Refractor #331

Merged
merged 13 commits into from
Mar 28, 2024
Merged

Logprobs Refractor #331

merged 13 commits into from
Mar 28, 2024

Conversation

hnyls2002
Copy link
Collaborator

@hnyls2002 hnyls2002 commented Mar 25, 2024

What does this PR do?

  • Refractor all logits and logprobs handling logic, including:
    • Clarify the naming rules: *_token_logprobs stands for each token' logprobs, *_prompt_logprob stands for the sum of logprobs of a segment of prompt.
    • Split the *_token_logprobs into prefill_token_logprobs and decode_token_logprobs.
  • Refractor SRT API and OpenAI API logprobs response format: List[(logprob, token_id, token_text)]
    • OpenAI API responds with the prefilling and decoding logprobs in one list and always returns all the prefilling logprobs.
    • SRT API can specify the logprob_start_len and respond in meta_data["prefill_token_logprobs"] and meta_data["decode_token_logprobs"] respectively.
  • Support detokenized results in sgl.select response which is the only use case for multiple requests in a single HTTP request.
  • top_logprobs support.

@hnyls2002 hnyls2002 marked this pull request as draft March 25, 2024 09:24
@hnyls2002 hnyls2002 marked this pull request as ready for review March 26, 2024 17:24
@hnyls2002
Copy link
Collaborator Author

closes #296
closes #232

Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Just a few comments

python/sglang/srt/managers/router/model_rpc.py Outdated Show resolved Hide resolved
python/sglang/srt/server.py Show resolved Hide resolved
python/sglang/srt/server.py Show resolved Hide resolved
python/sglang/srt/server.py Show resolved Hide resolved
@hnyls2002 hnyls2002 merged commit 3842eba into main Mar 28, 2024
@hnyls2002 hnyls2002 deleted the logprobs branch March 28, 2024 06:34
This was referenced May 12, 2024
if extend_seq_lens_cpu[i] == 0:
continue
k = input_metadata.top_logprobs_nums[i]
t = all_logprobs[pt : pt + extend_seq_lens_cpu[i]].topk(k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pt is not accumulated

extend_seq_lens_cpu = input_metadata.extend_seq_lens
for i in range(len(input_metadata.extend_seq_lens)):
if extend_seq_lens_cpu[i] == 0:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue will result in a out-of-bound error later.

normalized_prompt_logprobs,
prefill_token_logprobs,
decode_token_logprobs,
) = self.backend.select(self, expr.choices, expr.temperature)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not modify the openai backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants