-
Notifications
You must be signed in to change notification settings - Fork 291
Insights: sgl-project/sglang
Overview
Could not load contribution data
Please try again later
35 Pull requests merged by 15 people
-
Update workflow files
#1214 merged
Aug 26, 2024 -
[Feature] Support fp8 e5m2 kv cache with flashinfer
#1204 merged
Aug 26, 2024 -
Update CI runner docs
#1213 merged
Aug 26, 2024 -
Update CI workflows
#1210 merged
Aug 25, 2024 -
[CI] Fix the issue of unit test hanging
#1211 merged
Aug 25, 2024 -
[Minor] Temporarily skip flaky test
#1209 merged
Aug 25, 2024 -
[Minor] Improve the function organization in TokenizerManager & improve loggers
#1208 merged
Aug 25, 2024 -
Support Alibaba-NLP/gte-Qwen2-7B-instruct embedding Model
#1186 merged
Aug 25, 2024 -
[Fix] Fixing the multi-images error for llava-onevision
#1205 merged
Aug 25, 2024 -
Relax the assert in moe throughput test to fix the flaky CI
#1207 merged
Aug 25, 2024 -
[Fix] the issue of random order when input is a list
#1199 merged
Aug 25, 2024 -
[CI] Fix the problem of hf runner too slow
#1202 merged
Aug 25, 2024 -
Update README.md
#1198 merged
Aug 24, 2024 -
Cleanup readme, llava examples, usage examples and nccl init
#1194 merged
Aug 24, 2024 -
feat: use gelu_tanh_and_mul
#1193 merged
Aug 24, 2024 -
Fix benchmark script
#1185 merged
Aug 22, 2024 -
Fix broken penalty
#1184 merged
Aug 22, 2024 -
[Minor] Improve logging and rename the health check endpoint name
#1180 merged
Aug 22, 2024 -
Improve code style of sampler
#1168 merged
Aug 21, 2024 -
[Docs] Fix rendering of details in README
#1179 merged
Aug 21, 2024 -
Support min-p sampling
#1167 merged
Aug 21, 2024 -
[Feature] Add a function to convert sampling_params to kwargs
#1170 merged
Aug 21, 2024 -
fix: custom op fallback forward native when lower sm80
#1177 merged
Aug 21, 2024 -
Improve multi-node stability
#1171 merged
Aug 21, 2024 -
[Feat] Support update weights without restart server
#1157 merged
Aug 20, 2024 -
fix: resolve README render
#1166 merged
Aug 20, 2024 -
support /v1/health using a generation 1 token
#1154 merged
Aug 20, 2024 -
misc: add hypervisor vendor
#1165 merged
Aug 20, 2024 -
[Feature] add disable-custom-all-reduce
#1148 merged
Aug 20, 2024 -
Improve docs and warnings
#1164 merged
Aug 20, 2024 -
feat: allow streaming for multi-prompt and/or parallel sampling
#1134 merged
Aug 20, 2024 -
Optimize MLA/GQA/MQA Triton decoding
#1138 merged
Aug 19, 2024 -
[Feat]Add support for optional start len of logprobs
#1035 merged
Aug 19, 2024 -
[Docs] Add instruction for running on clouds and kubernetes with SkyPilot
#1144 merged
Aug 19, 2024
7 Pull requests opened by 6 people
-
Save memory from interleaved attention
#1151 opened
Aug 19, 2024 -
chore: bump v0.2.14
#1155 opened
Aug 19, 2024 -
Separated control and compute loop, shorten the critical path, and enable more complicated policies
#1182 opened
Aug 22, 2024 -
Dry sample
#1187 opened
Aug 23, 2024 -
Move sampler into CUDA graph
#1201 opened
Aug 25, 2024 -
minor: improve CI and dependencies
#1212 opened
Aug 26, 2024 -
improve the threshold and ports in tests
#1215 opened
Aug 26, 2024
12 Issues closed by 7 people
-
[Bug] Potential Logic Error in Memory Capacity Check for Distributed Setup
#1015 closed
Aug 24, 2024 -
[Feature] support min_p sampling
#1071 closed
Aug 23, 2024 -
[Help wanted] Does RadixAttention have anything to do with attention?
#1181 closed
Aug 22, 2024 -
[Bug] Runtime Stuck
#1173 closed
Aug 21, 2024 -
[Feature] SGLang using JSON as template config file needs improve
#1172 closed
Aug 21, 2024 -
[Feature] add disable_custom_all_reduce
#1118 closed
Aug 21, 2024 -
[Feature] The real health check API
#853 closed
Aug 20, 2024 -
[Feature] Support W8A16 Int8 inside FusedMoE
#1161 closed
Aug 20, 2024 -
[Feature] In Sglang ,Is chunked-prefill use fused(prefill+decode) batch?
#1162 closed
Aug 20, 2024 -
[Bug] Gemma-2-9b-it produces garbage output
#1160 closed
Aug 20, 2024 -
In which file is constraint decoding implemented?
#1149 closed
Aug 19, 2024 -
[Bug] --disable-flashinfer is broken
#1146 closed
Aug 19, 2024
15 Issues opened by 11 people
-
[Feature] add option to use liger triton kernel
#1216 opened
Aug 26, 2024 -
Accuracy degrading in concurrent scenario
#1203 opened
Aug 25, 2024 -
[Feature] Use Embedding/Generation Model to get its Generation/Emebedding
#1200 opened
Aug 25, 2024 -
[Bug] enable-torch-compile error
#1196 opened
Aug 24, 2024 -
[Bug] Bad outputs with fp8 quantization at high RPS
#1195 opened
Aug 24, 2024 -
[Bug] Server crashes after loading (Mixtral 8x7b)
#1191 opened
Aug 23, 2024 -
[Feature] Jamba 1.5 Support PLS
#1190 opened
Aug 23, 2024 -
[Bug] schedule_batch.py: IndexError: list index out of range
#1189 opened
Aug 23, 2024 -
[Bug] vllm updated its get_model function
#1183 opened
Aug 22, 2024 -
[Bug] Dynamic FP8 quantization fails due to incorrect tensor shape
#1178 opened
Aug 21, 2024 -
[Feature] Repeated generation expression
#1175 opened
Aug 21, 2024 -
[Bug] head_dim 96 not supported
#1159 opened
Aug 20, 2024 -
[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2
#1156 opened
Aug 19, 2024 -
[Tracker] OpenRouter LLM rankings tracking
#1152 opened
Aug 19, 2024
12 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[FEAT] JSON constrained support
#1125 commented on
Aug 23, 2024 • 2 new comments -
Supports the InternVL multimodal large model
#328 commented on
Aug 19, 2024 • 0 new comments -
[Bug] Llama3 70B A100 PCIE TP4 slow speed
#1137 commented on
Aug 19, 2024 • 0 new comments -
[Bug] pt_main_thread uses 100% cpu all the time
#955 commented on
Aug 19, 2024 • 0 new comments -
[Bug] OOM for concurrent long requests
#1030 commented on
Aug 19, 2024 • 0 new comments -
Add Default Timeout to urllib.request.urlopen Calls to Prevent Potential Hanging
#339 commented on
Aug 21, 2024 • 0 new comments -
[Feature] Allow arbitrary logit processors
#1036 commented on
Aug 21, 2024 • 0 new comments -
[Feature] plan to support medusa?
#859 commented on
Aug 23, 2024 • 0 new comments -
[Bug] when llama-3.1-70b-instruct batch inference, CUDA memory usage is unusually large
#1132 commented on
Aug 25, 2024 • 0 new comments -
Development Roadmap (2024 Q3)
#634 commented on
Aug 25, 2024 • 0 new comments -
[RFC] Add an LLM engine
#1127 commented on
Aug 21, 2024 • 0 new comments -
Flex scheduler
#1142 commented on
Aug 20, 2024 • 0 new comments