Development Roadmap #157

Ying1123 · 2024-02-07T07:13:40Z

Function Calling

Frontend
- Add tools argument in sgl.gen. See also guidance tools
Backend
- OpenAI: Translate to their function calling API (https://platform.openai.com/docs/guides/function-calling).
  - Function calling for OpenAI backend #573
- Local Models (SGLang)
  1. Use SGLang primitives (regex, select) and constrained decoding to implement a workflow
  2. Directly use models that support function calling (e.g., Gorilla OpenFunctions, https://huggingface.co/jondurbin/bagel-dpo-7b-v0.4#prompting-strategies)
- Local Models (OpenAI-compatible API)

High-level Pythonic Interface

LLM integration with normal programming patterns or, a high level sglang interface #39

Inference Optimizations

Speculative decoding for local models
Speculative execution for OpenAI Chat API
- support speculative execution for openai API #48

Structured Decoding

Support parallel JSON decoding SGLang Integration varunshenoy/super-json-mode#8
Support auto parallel decoding https://arxiv.org/abs/2401.06761

Compiler

Support tracing and compiling sgl.fork
Support sending a full serialized SGL program to the server

LoRA Support

Port multi-LoRA batching and unified memory from S-LoRA

Model Coverage

Vision Langauge Models. Support top-performing models from https://github.com/open-compass/VLMEvalKit
Language Models. Port the implementation of popular models from https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models. (help)

Device Coverage

AMD support. Investigate AMD support in Trion and FlashInfer.
CPU support. This is better done by adding a llama.cpp backend.

The text was updated successfully, but these errors were encountered:

AriMKatz · 2024-02-07T16:36:56Z

Are there still plans for a high level pythonic interface? #39 (comment)

Ying1123 · 2024-02-07T19:03:20Z

Are there still plans for a high level pythonic interface? #39 (comment)

Hi @AriMKatz, thanks for the reference. This is very important, I just added it.

nivibilla · 2024-02-08T23:03:05Z

For the vision models support, is it possible to align with the openai gpt4v API?
https://platform.openai.com/docs/guides/vision

aliencaocao · 2024-02-10T06:27:24Z

Are there plans for loading models in 8bit or 4bit?

Ying1123 · 2024-02-10T11:58:59Z

For the vision models support, is it possible to align with the openai gpt4v API? https://platform.openai.com/docs/guides/vision

@nivibilla Yes, it is already aligned with the openai gpt4v API, see here.
You can also find a runnable example of serving it with Sky Serve here.

Ying1123 · 2024-02-10T12:03:02Z

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

aliencaocao · 2024-02-10T12:06:04Z

Are there plans for loading models in 8bit or 4bit?

@aliencaocao Thanks for the question! The AWQ and GPTQ are already supported. But we do not support an automatic dtype translation yet. You are welcome to submit a PR for that.

I'm looking to load llava 1.6 in 8bit, but it does not seem that llava series has AWQ or GPTQ quants, or did I miss out anything here?

EDIT: I saw 1.5 has but not 1.6 yet. Perhaps its just too new and no one did a calibration yet.

qeternity · 2024-04-01T23:02:59Z

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Ying1123 · 2024-04-02T05:48:36Z

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.

Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Bit0r · 2024-04-02T11:34:14Z

Tools support is very important, which is necessary for many use cases.

omri-sap · 2024-04-04T09:58:38Z

Is TinyLlama supported? TinyLlama/TinyLlama-1.1B-Chat-v1.0
generation seems a bit slow...

wille-x · 2024-05-06T09:31:53Z

I see llama.cpp integration is on the roadmap. When will this feature be delivered? It would be very nice to have it , since it will support running local LLMs, such as llama models, on Mac computers and experiment them with the powerful and expressive SGLang.

Gintasz · 2024-05-08T16:10:42Z

I'd request to include support for Phi-3-mini

binarycrayon · 2024-06-24T18:31:54Z

Hi all - is anyone working on the S-LoRA integration currently? I see the branch, but it looks a few months old.
Would love to see this, happy to pick up from existing work or start fresh.

Hi @qeternity, I was working on it but have been blocked by other affairs. You are welcome to contribute, either continue on the branch or start fresh! I'll be happy to review and collaborate.

Hi which branch is it? looks like better start fresh

Ying1123 pinned this issue Feb 7, 2024

aflah02 mentioned this issue Feb 8, 2024

Adding GPT-NeoX #164

Closed

sutyum mentioned this issue Apr 22, 2024

Abstraction for Finetuning stanfordnlp/dspy#885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development Roadmap #157

Development Roadmap #157

Ying1123 commented Feb 7, 2024 •

edited

Loading

AriMKatz commented Feb 7, 2024

Ying1123 commented Feb 7, 2024

nivibilla commented Feb 8, 2024

aliencaocao commented Feb 10, 2024

Ying1123 commented Feb 10, 2024 •

edited

Loading

Ying1123 commented Feb 10, 2024

aliencaocao commented Feb 10, 2024 •

edited

Loading

qeternity commented Apr 1, 2024

Ying1123 commented Apr 2, 2024

Bit0r commented Apr 2, 2024

omri-sap commented Apr 4, 2024

wille-x commented May 6, 2024

Gintasz commented May 8, 2024 •

edited

Loading

binarycrayon commented Jun 24, 2024

Development Roadmap #157

Development Roadmap #157

Comments

Ying1123 commented Feb 7, 2024 • edited Loading

Function Calling

High-level Pythonic Interface

Inference Optimizations

Structured Decoding

Compiler

LoRA Support

Model Coverage

Device Coverage

AriMKatz commented Feb 7, 2024

Ying1123 commented Feb 7, 2024

nivibilla commented Feb 8, 2024

aliencaocao commented Feb 10, 2024

Ying1123 commented Feb 10, 2024 • edited Loading

Ying1123 commented Feb 10, 2024

aliencaocao commented Feb 10, 2024 • edited Loading

qeternity commented Apr 1, 2024

Ying1123 commented Apr 2, 2024

Bit0r commented Apr 2, 2024

omri-sap commented Apr 4, 2024

wille-x commented May 6, 2024

Gintasz commented May 8, 2024 • edited Loading

binarycrayon commented Jun 24, 2024

Ying1123 commented Feb 7, 2024 •

edited

Loading

Ying1123 commented Feb 10, 2024 •

edited

Loading

aliencaocao commented Feb 10, 2024 •

edited

Loading

Gintasz commented May 8, 2024 •

edited

Loading