Add gptq quantization model support #141

Arcmoon-Hu · 2024-02-04T09:22:50Z

I just test on Qwen-1.8B-Chat-Int4 model on my old gpu.

merge from main repo

Auto merge from main repo

support gptq-quantize support

code style

python/sglang/srt/managers/router/model_runner.py

python/sglang/srt/layers/radix_attention.py

Co-authored-by: Ying Sheng <[email protected]>

merrymercy · 2024-02-06T19:35:13Z

@Arcmoon-Hu thanks!

Arcmoon-Hu and others added 15 commits January 21, 2024 20:22

support qwen model

b96e292

add some code rule

d27cf43

fix bugs

78f256d

add phi support

daf9eeb

Merge branch from 'sgl-project-main'

f11d231

solve conflict

6c9b8d7

Merge pull request #5 from sgl-project/main

4c31655

merge from main repo

Merge pull request #6 from sgl-project/main

ef356f5

Auto merge from main repo

add gptq support and fix some bugs

624640c

add gptq quantization model support

3cc12d2

fix bugs

69b0170

Merge pull request #7 from Arcmoon-Hu/dev

f2f358c

support gptq-quantize support

code style

2f0b078

Merge pull request #8 from Arcmoon-Hu/dev

dc2bd7f

code style

delete debug info

25f0573

Ying1123 approved these changes Feb 4, 2024

View reviewed changes

python/sglang/srt/managers/router/model_runner.py Outdated Show resolved Hide resolved

python/sglang/srt/layers/radix_attention.py Outdated Show resolved Hide resolved

merrymercy force-pushed the main branch 2 times, most recently from 1f41598 to 8ff870b Compare February 5, 2024 11:29

Arcmoon-Hu and others added 2 commits February 6, 2024 15:34

Apply suggestions from code review

dda933e

Co-authored-by: Ying Sheng <[email protected]>

Update python/sglang/srt/managers/router/model_runner.py

c9f192a

Co-authored-by: Ying Sheng <[email protected]>

merrymercy approved these changes Feb 6, 2024

View reviewed changes

merrymercy merged commit 3ae78a0 into sgl-project:main Feb 6, 2024

merrymercy mentioned this pull request Feb 6, 2024

Support gptq quantization #124

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gptq quantization model support #141

Add gptq quantization model support #141

Arcmoon-Hu commented Feb 4, 2024

merrymercy commented Feb 6, 2024

Add gptq quantization model support #141

Add gptq quantization model support #141

Conversation

Arcmoon-Hu commented Feb 4, 2024

merrymercy commented Feb 6, 2024