Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding the triton docker build minimal example #242

Merged
merged 4 commits into from
Mar 12, 2024

Conversation

amirarsalan90
Copy link
Contributor

Adding a minimal example to build docker container to serve sglang with triton inference server using python backend.

@amirarsalan90 amirarsalan90 mentioned this pull request Feb 28, 2024
@isaac-vidas
Copy link
Contributor

isaac-vidas commented Feb 28, 2024

Thanks for this example!

Would it be possible to run the server from inside the model.py file with:

runtime = sgl.Runtime(model_path="mistralai/Mistral-7B-Instruct-v0.2")
sgl.set_default_backend(runtime)

@amirarsalan90
Copy link
Contributor Author

amirarsalan90 commented Feb 28, 2024

tritonserver --model-repository=/path/to/model/repository command halts when I try to run server in model.py with sgl.Runtime to load mistralai/Mistral-7B-Instruct-v0.2. I didn't try models other than mistralai/Mistral-7B-Instruct-v0.2 though

@merrymercy
Copy link
Contributor

@amirarsalan90 Thanks for contributing to this! Could you document the files better?

  1. Why is the folder name called "1"?
  2. What is the purpose of examples/usage/triton/inference.ipynb? Can it be deleted?
  3. Can you share an example command to query the triton server?

@merrymercy merrymercy self-assigned this Mar 11, 2024
@isaac-vidas
Copy link
Contributor

@merrymercy this implementation follows Triton's convention for model registry. See here for more details. The schema of the API inputs / outputs is specified in the config.pbtxt and the model itself is then placed under 1 folder next to it.
While this is a working solution, I suspect that this is mostly running SGLang as is behind Triton where it's still required to run the backend process independently.

A different approach to this could be to run as a triton backend similar to what vLLM do here. I suspect this would be a bit more involved with how SGLang creates the backend processes as part of the server but haven't looked into it too closely.

@amirarsalan90
Copy link
Contributor Author

@merrymercy as @isaac-vidas explained, that is the directory convention for triton inference server for model registry. I removed the inference.ipynb notebook and added curl request to the readme file to query the triton server.

As far as I understand, vllm backend for triton inference server also disables some features of Triton (like batching) and has some limitations:
https://github.com/triton-inference-server/vllm_backend/blob/c1c88fa7dfbebcd3198ada913e127304d5ff0b46/src/model.py#L93

https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md

But I agree this is a very minimal and basic way to set up triton for sglang. I needed it for a project of mine, and thought it might be helpful for others too.

@merrymercy merrymercy merged commit eb4308c into sgl-project:main Mar 12, 2024
@merrymercy
Copy link
Contributor

@amirarsalan90 It is merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants