Pulse · HabanaAI/vllm-fork · GitHub

August 1, 2024 – August 8, 2024

Overview

20 Active pull requests

3 Active issues
- 8 Merged pull requests
- 12 Open pull requests
- 0 Closed issues
- 3 New issues

8 Pull requests merged by 3 people

Enable Multi-LoRA support for HPU
#143 merged Aug 8, 2024
Revert "Allocate blocks from id=1 for HPU"
#163 merged Aug 6, 2024
Allocate blocks from id=1 for HPU
#160 merged Aug 6, 2024
Allocate blocks from id=1
#155 merged Aug 6, 2024
Set block_size=128
#154 merged Aug 5, 2024
Overhaul HPU memory management in HPUGraph capture
#147 merged Aug 5, 2024
Re-enable FusedRoPE
#145 merged Aug 5, 2024
Add support for LLama70B FP8 1xG2
#150 merged Aug 5, 2024

12 Pull requests opened by 9 people

offline script to test granite model
#148 opened Aug 2, 2024
Fix delayed sampling TP>1
#149 opened Aug 5, 2024
Tflops measurement - habana_main
#151 opened Aug 5, 2024
[WIP] tflops measurement - habana_next
#152 opened Aug 5, 2024
Fix guided sampling with outlines
#153 opened Aug 5, 2024
Draft: Add option to limit number of buckets
#156 opened Aug 5, 2024
enable fusedsdpa for prompt attention with env VLLM_PREFILL_USE_FUSESDAPA=1
#157 opened Aug 5, 2024
[WIP] Porting delayed sampling feature
#159 opened Aug 6, 2024
Fix blocks allocation range
#161 opened Aug 6, 2024
initial works on enabling automatic prefix caching
#162 opened Aug 6, 2024
Reimplement silu_and_mul for mixtral
#164 opened Aug 7, 2024
Reimplement silu_and_mul for mixtral
#167 opened Aug 8, 2024

3 Issues opened by 3 people

[Performance]: context aware HpuRotaryEmbedding implementation
#166 opened Aug 8, 2024
[Doc]: Broken link in Gaudi-Installation Readme.
#165 opened Aug 7, 2024
[Bug]: Unexpected decode graph compilation after preemption
#158 opened Aug 6, 2024

3 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Support FP8 INC in vLLM
#144 commented on Aug 8, 2024 • 1 new comment
[Bug]: llama 405B fp8 fails
#140 commented on Aug 6, 2024 • 0 new comments
Support Mixtral quantization using HQT
#123 commented on Aug 4, 2024 • 0 new comments