Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup codebase: removed unnecessary code/logic #298

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

Qubitium
Copy link
Contributor

@Qubitium Qubitium commented Mar 14, 2024

Fixed two issues in router loop:

  1. double sleep (this may cause performance loss during high concurrency)
  2. protect against sleep on potential GLOBAL_BACKEND_CONFIG.extend_dependency_time == 0

Cleanup:

  1. remove unused self.eos_token_id in modle_rpc
  2. removed unused ReqState.lock

EDIT:

Reverted due to review: (removed useless math for calculating # of completion_tokens)

@Qubitium Qubitium changed the title [WIP] Fix double sleep and sleeping on extend_dependency_time == 0 [WIP] Performance: eliminate double sleep in router loop Mar 14, 2024
@Qubitium Qubitium changed the title [WIP] Performance: eliminate double sleep in router loop [WIP] Eliminate double sleep in router loop Mar 14, 2024
@Qubitium
Copy link
Contributor Author

@merrymercy @hnyls2002 PR tested with no performance regression. Ready for review/merge.

@Qubitium Qubitium changed the title [WIP] Eliminate double sleep in router loop Eliminate double sleep in router loop Mar 14, 2024
@Qubitium
Copy link
Contributor Author

Latest commit removed redundant math:

eea0efa

prompt_tokens = len(req.input_ids). So adding len(req.input_ids) - prompt_tokens is always 0.

image

@Qubitium Qubitium changed the title Eliminate double sleep in router loop Remove double sleep and redundant math Mar 18, 2024
@Qubitium Qubitium changed the title Remove double sleep and redundant math Cleanup codebase: removed unnecessary code/logic Mar 19, 2024
@Qubitium
Copy link
Contributor Author

Latest commit : 9472363

Removed unused ReqState.lock

python/sglang/srt/managers/router/model_rpc.py Outdated Show resolved Hide resolved

await asyncio.sleep(0.0006)
if not slept:
await asyncio.sleep(0.0006)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Did you find this give better performance?
  2. @hnyls2002 please also take a look

Copy link
Contributor Author

@Qubitium Qubitium Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In my limited testing with low concurrency (1-2 concurrent requests with range 1-10 batches each) I did not find performance diff.

But this is unnecessary code/logic. There is no need to sleep for another 0.6ms (causing potential thread context switch) when it has slept for 30ms in earlier condition extend_dependency_time: float = 0.03.

@merrymercy merrymercy merged commit ce216c8 into sgl-project:main Mar 23, 2024
@Qubitium Qubitium deleted the mod-forward-sleep branch March 28, 2024 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants