OpenCompass 基于HuggingFace🤗大模型评测

构建一键式脚本对大模型效果进行评测

🧶 目前使用llama-7b-hf进行实验，后续只需更改模型配置文件即可。
📈 对齐LLaMA 2的Evaluation此外加一下中文的一些Evaluation(主要是CMMLU和C-Eval)
📑 评价指标参考https://arxiv.org/pdf/2307.09288.pdf A2.2部分

更多详细信息请参阅lark文档：https://mgf127vt7ge.sg.larksuite.com/docx/J4W4djHR6oYPulx2mAQlhNZtgSd

🛠️ 安装

虚拟环境配置

conda create --name opencompass python=3.10 pytorch torchvision pytorch-cuda -c nvidia -c pytorch -y
conda activate opencompass

下载opencompass

# 创建evaluation目录
mkdir evaluation
cd evaluation
# 下载
git clone https://github.com/HKGAI/EmergentAbilityEval.git opencompass
cd opencompass

安装依赖

pip install -e .

下载数据集到 data/

wget https://github.com/open-compass/opencompass/releases/download/0.1.1/OpenCompassData.zip
unzip OpenCompassData.zip

下载humaneval数据集

git clone https://github.com/openai/human-eval.git
cd human-eval
pip install -r requirements.txt
pip install -e .
cd ..

⚠️注意：使用humaneval时需要手动到human-eval/human_eval/execution.py 文件的第 58 行取消注释才能正常评测。

🏗️ 评测

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后，可以通过以下命令评测 llama-7b-hf 模型在数据集上的性能：

#命令行方式
python run.py eval_llama_7b_test.py -p slurm_conifg.py
#脚本方式
./eval_llama.sh

📖 结果

所有运行输出将定向到 /home/hkustadmin/evaluation/opencompass/outputs/default/ 目录，结构如下：

outputs/default/
├── 20231113_164612
├── 20231113_183030     # 每个实验一个文件夹
│   ├── configs         # 用于记录的已转储的配置文件。如果在同一个实验文件夹中重新运行了不同的实验，可能会保留多个配置
│   ├── logs            # 推理和评估阶段的日志文件
│   │   ├── eval
│   │   └── infer
│   ├── predictions   # 每个任务的推理结果
│   ├── results       # 每个任务的评估结果
│   └── summary       # 单个实验的汇总评估结果
├── ...

结果预览：

Name		Name	Last commit message	Last commit date
Latest commit History 326 Commits
.github		.github
configs		configs
docs		docs
human-eval		human-eval
opencompass		opencompass
outputs/default		outputs/default
requirements		requirements
squad_v2		squad_v2
tests		tests
tools		tools
.codespellrc		.codespellrc
.gitignore		.gitignore
.owners.yml		.owners.yml
.pre-commit-config-zh-cn.yaml		.pre-commit-config-zh-cn.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
301.99B_log		301.99B_log
LICENSE		LICENSE
README.md		README.md
amber.sh		amber.sh
amber_log		amber_log
checkpoints		checkpoints
docker_start.sh		docker_start.sh
docker_start_opencompass.sh		docker_start_opencompass.sh
eval_amber.py		eval_amber.py
eval_exp5_1_debug.py		eval_exp5_1_debug.py
eval_exp5_1_mmlu_hellaswag.py		eval_exp5_1_mmlu_hellaswag.py
eval_hf_llama_7b.py		eval_hf_llama_7b.py
eval_llama.sh		eval_llama.sh
eval_llama_7b_test.py		eval_llama_7b_test.py
eval_llama_7b_test_new.py		eval_llama_7b_test_new.py
eval_llama_7b_test_ruibin.py		eval_llama_7b_test_ruibin.py
eval_megatron_debug.py		eval_megatron_debug.py
eval_pt1_5.py		eval_pt1_5.py
eval_pt1_5_mmlu_hellaswag.py		eval_pt1_5_mmlu_hellaswag.py
eval_pt1_6.py		eval_pt1_6.py
eval_pt1_6_mmlu_hellaswag.py		eval_pt1_6_mmlu_hellaswag.py
hf_llama_7b.py		hf_llama_7b.py
hf_llama_7b_ruibin.py		hf_llama_7b_ruibin.py
hkgai_amber_log		hkgai_amber_log
nohup.out		nohup.out
olmo_log		olmo_log
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
slimpajama_log		slimpajama_log
slurm_config.py		slurm_config.py
summarizer.py		summarizer.py
test_amber_log		test_amber_log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCompass 基于HuggingFace🤗大模型评测

🛠️ 安装

🏗️ 评测

📖 结果

About

Releases

Packages

Languages

License

LLM-Dev-Open/opencompass

Folders and files

Latest commit

History

Repository files navigation

OpenCompass 基于HuggingFace🤗大模型评测

🛠️ 安装

🏗️ 评测

📖 结果

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages