Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.vscode		.vscode
__pycache__		__pycache__
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_EN.md		README_EN.md
calib_tools.py		calib_tools.py
categories.py		categories.py
crop.py		crop.py
evaluate.py		evaluate.py
evaluate_chatglm3.py		evaluate_chatglm3.py
evaluate_chatglm3_sft.py		evaluate_chatglm3_sft.py
evaluate_flan.py		evaluate_flan.py
evaluate_gpt.py		evaluate_gpt.py
requirements.txt		requirements.txt
statistics_results.py		statistics_results.py
test_calibration.py		test_calibration.py
test_glm.py		test_glm.py
test_glm_sft.py		test_glm_sft.py
test_gpt.py		test_gpt.py

Repository files navigation

Measuring Massive Multitask Language Understanding 测量大规模的多任务语言理解

这是 Dan Hendrycks、Collin Burns、Steven Basart、Andy Zou、Mantas Mazeika、Dawn Song 和 Jacob Steinhardt （ICLR 2021）的《测量大规模多任务语言理解》的存储库。

该存储库包含 OpenAI API 评估代码，测试可在此处下载。

测试排行榜

如果您想将您的模型添加到排行榜中，请联系我们或提交拉取请求。

测试结果：

模型	作者	人文科学	社会科学	STEM	其他	平均
Chinchilla (70B,少样本）	Hoffmann et al., 2022	63.6	79.3	54.9	73.9	67.5
Gopher (280B, 少样本)	Rae et al., 2021	56.2	71.9	47.4	66.1	60.0
GPT-3 (175B, 微调)	Brown et al., 2020	52.5	63.9	41.4	57.9	53.9
flan-T5-xl	Chung et al., 2022	46.3	57.7	39.0	55.1	49.3
UnifiedQA	Khashabi et al., 2020	45.6	56.6	40.2	54.6	48.9
GPT-3 (175B, 少样本)	Brown et al., 2020	40.8	50.4	36.7	48.8	43.9
GPT-3 (6.7B, 微调)	Brown et al., 2020	42.1	49.2	35.1	46.9	43.2
flan-T5-large	Chung et al., 2022	39.1	49.1	33.2	47.4	41.9
flan-T5-base	Chung et al., 2022	34.0	38.1	27.6	37.0	34.2
GPT-2	Radford et al., 2019	32.8	33.3	30.2	33.1	32.4
flan-T5-small	Chung et al., 2022	29.9	30.9	27.5	29.7	29.5
随机基线	N/A	25.0	25.0	25.0	25.0	25.0

引文

如果你发现这在你的研究中有用，请考虑引用该测试以及它所引用的ETHICS数据集：

@article{hendryckstest2021,
  title={Measuring Massive Multitask Language Understanding},
  author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt},
  journal={Proceedings of the International Conference on Learning Representations (ICLR)},
  year={2021}
}

@article{hendrycks2021ethics,
  title={Aligning AI With Shared Human Values},
  author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt},
  journal={Proceedings of the International Conference on Learning Representations (ICLR)},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Measuring Massive Multitask Language Understanding 测量大规模的多任务语言理解

测试排行榜

引文

About

Releases

Packages

Languages

License

dujh22/MMLU

Folders and files

Latest commit

History

Repository files navigation

Measuring Massive Multitask Language Understanding 测量大规模的多任务语言理解

测试排行榜

引文

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages