Skip to content

qitas/docs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Link👁️

模型 开源 中文推理 中文语言 总分
GPT-4-1106-preview - 7.73 8.29 8.01
DeepSeek-V2-Chat(RL) 7.45 8.36 7.91
erniebot-4.0-202404 (文心一言) - 7.61 8.17 7.89
DeepSeek-V2-Chat(SFT) 7.30 8.17 7.74
GPT-4-0613 - 7.47 7.59 7.53
erniebot-4.0-202312 (文心一言) - 6.84 7.88 7.36
moonshot-v1-32k-202404 (月之暗面) - 6.42 8.02 7.22
Qwen1.5-72B-Chat (通义千问) 6.45 7.93 7.19
DeepSeek-67B-Chat 5.75 7.11 6.43
Yi-34B-Chat (零一万物) 4.86 7.38 6.12
GPT-3.5-turbo-0613 - 5.35 6.71 6.08
小模型 开源 中文推理 中文语言 英文 编码
Yi-1.5-9B
Yi-1.5-6B
Model English Chinese Code Math Params Context
DeepSeek-V2-Chat(RL) 157.5 159.6 185.6 146.1
DeepSeek-V2-Chat(SFT) 159.7 163.3 175.9 143.5
LLaMA3-70B-Instruct 160.4 138.6 176.5 141.7
Mixtral-8x22B 156.2 121.0 164.4 137.7 44/176
QWen1.5-72B-Chat 142.1 165.1 140.9 122.5
DeepSeek-V2(MoE-236B) 157.4 165.7 115.4 122.8 128k
DeepSeek-V1-Chat(SFT) 142.8 133.0 153.5 116.7
LLaMA3-70B 159.9 136.8 116.8 125.2
Mixtral-8x7B 13/56
DeepSeek-V1(Dense-67B) 139.9 136.9 102.5 82.1
DeepSeek-V2-Lite-Chat 2.4/15.7 32K
Arctic-128×3.66B(MoE-480B) 17/480

English

English Domain MMLU BBH Total
Claude-3-Opus 86.8%(5-shot) 86.8%(3-shot)
LLaMA3-70B-Instruct 80.3 80.1 160.4
LLaMA3-70B 78.9 81.0 159.9
DeepSeek-V2-Chat(SFT) 78.4 81.3 159.7
DeepSeek-V2-Chat(RL) 77.8 79.7 157.5
DeepSeek-V2(MoE-236B) 78.5 78.9 157.4
Mixtral-8x22B 77.6 78.9 156.5
Mixtral-8x7B 70.4
DeepSeek-V1 Chat(SFT) 71.1 71.7 142.8
QWen1.5-72B-Chat 77.3 65.9 142.1
Yi-1.5-34B-Chat 76.8
Yi-1.5-9B-Chat 69.5 72.4
Yi-1.5-6B-Chat 63.5 59.0
QWen1.5-32B-Chat 74.3
Mixtral-8x7B-Instruct-v0.1 71.4
Mixtral-8x22B-Instruct-v0.1 77.7
DeepSeek-V1(Dense-67B) 71.3 68.7 139.0
GPT-4 86.4 86.7
DeepSeek-V2-Lite-Chat 55.7 48.1
DeepSeekMoE-16B-Chat 47.2 42.2
DeepSeek-7B-Chat 49.7 43.1
Arctic-128×3.66B(MoE-480B) 67.3?

Chinese

Chinese Domain C-Eval CMMLU CLUEWSC
DeepSeek-V2 (MoE-236B) 81.7 84.0
QWen1.5-72B-Chat 82.2 82.9
DeepSeek-V2-Chat(SFT) 80.9 82.4
DeepSeek-V2-Chat(RL) 78.0 81.6
LLaMA3-70B-Instruct 67.9 70.7
DeepSeek-V1(Dense-67B) 66.1 70.8
LLaMA3-70B 67.5 69.3
DeepSeek-V1-Chat(SFT) 65.2 67.8
Mixtral-8x22B 60.0 61.0
GPT-4 69.9 71.0
QWen-14B-Chat 71.7 70.0
Yi-34B-Chat 77.71 73.52
QWen1.5-7B-Chat 73.4
Yi-1.5-9B 74.8
Yi-1.5-6B 70.8
DeepSeek-V2-Lite-Chat 60.1 62.5 80.0
DeepSeekMoE-16B-Chat 40.0 49.3 68.2
DeepSeek-7B-Chat 44.7 51.2 66.2

Code

Code Domain HumanEval MBPP LiveCodeBench(0901-0401) MT-Bench
Claude-3-Opus 84.9%(0-shot)
DeepSeek-V2-Chat(RL) 81.1 72.0 32.5
LLaMA3 70B Instruct 76.2 69.8 30.5
DeepSeek-V2-Chat(SFT) 76.8 70.4 28.7
Yi-1.5-34B-Chat 75.2 74.6 8.5
Mixtral-8x22B 75.0 64.4 25.0
DeepSeek-V1-Chat(SFT) 73.8 61.4 18.3
QWen1.5-72B-Chat 64.6 72.5 18.8 8.61
LLaMA3-70B 48.2 68.6
DeepSeek-V2(MoE-236B) 48.8 66.6
Yi-1.5-9B-Chat 66.5 78.8 8.2
Yi-1.5-6B-Chat 64.0 70.9 7.5
LLaMA3-8B-Instruct 61.6 61.4 8.0
DeepSeek-V1(Dense-67B) 45.1 57.4
QWen1.5-32B-Chat 51.2 66.9 8.3
QWen1.5-14B-Chat 7.91
Mixtral-8x7B-Instruct-v0.1 45.1 59.5 8.3
Mixtral-8x22B-Instruct-v0.1 76.2 73.8 8.6
QWen1.5-7B-Chat 36.0 46.1 7.60
Yi-1.5-9B 41.4 61.1
Yi-1.5-6B 36.5 56.8
DeepSeek-V2-Lite-Chat 57.3 45.8
DeepSeekMoE-16B-Chat 45.7 46.2
DeepSeek-7B-Chat 45.1 39.0
HumanEval Pass@1 Pass@10 0-shot 5-shot
Claude-3-Opus 84.9%
StarCoder2-15B
StarCoder2-7B
StarCoder2-3B
LLaMA3-70B 81.7
LLaMA3-8B 62.2
Yi-Chat-34B 7.9
QWen-14B-Chat 11.1
DeepSeek-Coder-33B-Instruct 31.7
GPT-4-Turbo 48.4

Math

Math Domain GSM8K MATH CMath
Claude-3-Opus 95.0%(0-shot) 60.1%(0-shot)
DeepSeek-V2 Chat (RL) 92.2 53.9
DeepSeek-V2 Chat (SFT) 90.8 52.7
LLaMA3-70B Instruct 93.2 48.5
Mixtral-8x22B 87.9 49.8
LLaMA3-70B 83.0 42.2
DeepSeek-V2 (MoE-236B) 79.2 43.6
QWen1.5-72B-Chat 86.0 44.4
DeepSeek-V1 Chat (SFT) 84.1 32.6
DeepSeek-V1 (Dense-67B) 63.4 18.7
Yi-1.5-34B-Chat 90.2 50.1
QWen1.5-32B-Chat 83.9 43.3
Mixtral-8x7B-Instruct-v0.1 65.7 28.4
Mixtral-8x22B-Instruct-v0.1 84.0 41.1
QWen1.5-7B-Chat 70.1 20.3
LLaMA3-70B 54.7 21.16
Yi-1.5-9B 73.7 32.6
Yi-1.5-6B 62.2 28.42
DeepSeek-7B-Chat 62.6 14.7 66.4
DeepSeekMoE-16B-Chat 62.2 15.2 67.9
DeepSeek-V2-Lite-Chat 72.0 27.9 71.7
Arctic-128×3.66B(MoE-480B) 74.2

Releases

No releases published

Sponsor this project

Packages

No packages published