Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Roadmap #157

Open
Ying1123 opened this issue Feb 7, 2024 · 14 comments
Open

Development Roadmap #157

Ying1123 opened this issue Feb 7, 2024 · 14 comments

Comments

@Ying1123
Copy link
Contributor

Ying1123 commented Feb 7, 2024

Function Calling

High-level Pythonic Interface

Inference Optimizations

Structured Decoding

Compiler

  • Support tracing and compiling sgl.fork
  • Support sending a full serialized SGL program to the server

LoRA Support

  • Port multi-LoRA batching and unified memory from S-LoRA

Model Coverage

Device Coverage

  • AMD support. Investigate AMD support in Trion and FlashInfer.
  • CPU support. This is better done by adding a llama.cpp backend.
@Ying1123 Ying1123 pinned this issue Feb 7, 2024
@AriMKatz
Copy link

AriMKatz commented Feb 7, 2024

Are there still plans for a high level pythonic interface? #39 (comment)

@Ying1123
Copy link
Contributor Author

Ying1123 commented Feb 7, 2024

Are there still plans for a high level pythonic interface? #39 (comment)

Hi @AriMKatz, thanks for the reference. This is very important, I just added it.

@aflah02 aflah02 mentioned this issue Feb 8, 2024
@nivibilla
Copy link
Contributor

For the vision models support, is it possible to align with the openai gpt4v API?
https://platform.openai.com/docs/guides/vision

@aliencaocao
Copy link

Are there plans for loading models in 8bit or 4bit?

@Ying1123
Copy link
Contributor Author

Ying1123 commented Feb 10, 2024

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

@nivibilla Yes, it is already aligned with the openai gpt4v API, see here.
You can also find a runnable example of serving it with Sky Serve here.

@Ying1123
Copy link
Contributor Author

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

@aliencaocao
Copy link

aliencaocao commented Feb 10, 2024

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

I'm looking to load llava 1.6 in 8bit, but it does not seem that llava series has AWQ or GPTQ quants, or did I miss out anything here?

EDIT: I saw 1.5 has but not 1.6 yet. Perhaps its just too new and no one did a calibration yet.

@qeternity
Copy link
Contributor

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

@Ying1123
Copy link
Contributor Author

Ying1123 commented Apr 2, 2024

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

@Bit0r
Copy link

Bit0r commented Apr 2, 2024

Tools support is very important, which is necessary for many use cases.

@omri-sap
Copy link

omri-sap commented Apr 4, 2024

Is TinyLlama supported? TinyLlama/TinyLlama-1.1B-Chat-v1.0
generation seems a bit slow...

@wille-x
Copy link

wille-x commented May 6, 2024

I see llama.cpp integration is on the roadmap. When will this feature be delivered? It would be very nice to have it , since it will support running local LLMs, such as llama models, on Mac computers and experiment them with the powerful and expressive SGLang.

@Gintasz
Copy link

Gintasz commented May 8, 2024

I'd request to include support for Phi-3-mini

@binarycrayon
Copy link

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.
Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Hi which branch is it? looks like better start fresh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

11 participants