-
Notifications
You must be signed in to change notification settings - Fork 26
Insights: HabanaAI/vllm-fork
Overview
-
- 8 Merged pull requests
- 12 Open pull requests
- 0 Closed issues
- 3 New issues
Could not load contribution data
Please try again later
8 Pull requests merged by 3 people
-
Enable Multi-LoRA support for HPU
#143 merged
Aug 8, 2024 -
Revert "Allocate blocks from id=1 for HPU"
#163 merged
Aug 6, 2024 -
Allocate blocks from id=1 for HPU
#160 merged
Aug 6, 2024 -
Allocate blocks from id=1
#155 merged
Aug 6, 2024 -
Set block_size=128
#154 merged
Aug 5, 2024 -
Overhaul HPU memory management in HPUGraph capture
#147 merged
Aug 5, 2024 -
Re-enable FusedRoPE
#145 merged
Aug 5, 2024 -
Add support for LLama70B FP8 1xG2
#150 merged
Aug 5, 2024
12 Pull requests opened by 9 people
-
offline script to test granite model
#148 opened
Aug 2, 2024 -
Fix delayed sampling TP>1
#149 opened
Aug 5, 2024 -
Tflops measurement - habana_main
#151 opened
Aug 5, 2024 -
[WIP] tflops measurement - habana_next
#152 opened
Aug 5, 2024 -
Fix guided sampling with outlines
#153 opened
Aug 5, 2024 -
Draft: Add option to limit number of buckets
#156 opened
Aug 5, 2024 -
enable fusedsdpa for prompt attention with env VLLM_PREFILL_USE_FUSESDAPA=1
#157 opened
Aug 5, 2024 -
[WIP] Porting delayed sampling feature
#159 opened
Aug 6, 2024 -
Fix blocks allocation range
#161 opened
Aug 6, 2024 -
initial works on enabling automatic prefix caching
#162 opened
Aug 6, 2024 -
Reimplement silu_and_mul for mixtral
#164 opened
Aug 7, 2024 -
Reimplement silu_and_mul for mixtral
#167 opened
Aug 8, 2024
3 Issues opened by 3 people
-
[Performance]: context aware HpuRotaryEmbedding implementation
#166 opened
Aug 8, 2024 -
[Doc]: Broken link in Gaudi-Installation Readme.
#165 opened
Aug 7, 2024 -
[Bug]: Unexpected decode graph compilation after preemption
#158 opened
Aug 6, 2024
3 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Support FP8 INC in vLLM
#144 commented on
Aug 8, 2024 • 1 new comment -
[Bug]: llama 405B fp8 fails
#140 commented on
Aug 6, 2024 • 0 new comments -
Support Mixtral quantization using HQT
#123 commented on
Aug 4, 2024 • 0 new comments