Skip to content

Latest commit

 

History

History
564 lines (557 loc) · 16.7 KB

supported_models.md

File metadata and controls

564 lines (557 loc) · 16.7 KB

Supported Models

Neural Speed supports the following models:

Text Generation

Model Name INT8 INT4 Transformer Version Max tokens length
RTN GPTQ AWQ AutoRound RTN GPTQ AWQ AutoRound
Meta-Llama-3-8B-Instruct Latest 8192
TinyLlama-1.1B, LLaMA2-tB, LLaMA2-13B, LLaMA2-70B Latest 4096
LLaMA-7B, LLaMA-13B Latest 2048
CodeLlama-7b Latest 16384
Solar-10.7B Latest 4096
Neural-Chat-7B-v3-1, Neural-Chat-7B-v3-2 Latest 32768
Mistral-7B, Mistral-7B-Instruct-v0.2, Mixtral-8x7B 4.36.0 or newer 32768
Qwen-7B, Qwen-14B, Qwen1.5-7B, Qwen1.5-0.5B Latest 8192 / 32768
GPT-J-6B Latest 2048
GPT-NeoX-20B Latest 2048
Dolly-v2-3B 4.28.1 or newer 2048
MPT-7B, MPT-30B Latest 2048
Falcon-7B, Falcon-40B Latest 2048
BLOOM-7B Latest 2048
OPT-125m, OPT-1.3B, OPT-13B Latest 2048
ChatGLM-6B, ChatGLM2-6B, ChatGLM3-6B 4.33.1 2048 / 32768
Baichuan-13B-Chat,Baichuan2-13B-Chat,Baichuan2-7B-Chat 4.33.1 4096
phi-2, phi-1_5 phi-1 Latest 2048
phi-3-128k, phi-3-48k Latest 128k
StableLM-2-1_6B, StableLM-3B, StableLM-2-12B Latest 4096
gemma-2b-it , gemma-7b Latest 8192
Whisper-tiny, Whisper-base Whisper-small Whisper-medium Whisper-large Latest 448

Code Generation

Model Name INT8 INT4 Transformer Version
RTN GPTQ AWQ AutoRound RTN GPTQ AWQ AutoRound
Code-LLaMA-7B, Code-LLaMA-13B Latest
Magicoder-6.7B Latest
StarCoder-1B, StarCoder-3B, StarCoder-15.5B Latest
Stable-Code-3B Latest

Validated GGUF Models

Model Name
F32 F16 Q4_0 Q8_0 BTLA
TheBloke/Llama-2-7B-Chat-GGUF
TheBloke/Mistral-7B-v0.1-GGUF, TheBloke/Mistral-7B-v0.2-GGUF,
TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF
TheBloke/SOLAR-10.7B-Instruct-v1.0-GGUF
TheBloke/CodeLlama-7B-GGUF,TheBloke/CodeLlama-13B-GGUF
Qwen1.5-7B-Chat-GGUF
Code-LLaMA-7B, Code-LLaMA-13B
meta-llama/Llama-2-7b-chat-hf
upstage/SOLAR-10.7B-Instruct-v1.0
Qwen-7B-Chat, Qwen1.5-7B-Chat
tiiuae/falcon-7
tiiuae/falcon-40b
mpt-7b
mpt-30b
bloomz-7b1