Ollama-MMLU-Pro

This is a modified version of TIGER-AI-Lab/MMLU-Pro, and it lets you run MMLU-Pro benchmark via the OpenAI Chat Completion API. It's tested on Ollama and Llama.cpp, but it should also work with LMStudio, Koboldcpp, Oobabooga with openai extension, etc.

Usage

For example, in order to run benchmark against Phi3 on Ollama, use:

pip install -r requirements.txt
python run_openai.py --url https://localhost:11434/v1 --model phi3

As default, it tests against all subjects, but you can use --category option to test only specific subject.

Subjects include: 'business', 'law', 'psychology', 'biology', 'chemistry', 'history', 'other', 'health', 'economics', 'math', 'physics', 'computer science', 'philosophy', 'engineering'

The default timeout is 600 seconds (10 minutes). If the model being tested takes a long time to respond, and you encounter "error Request timed out" message, use --timeout number_of_seconds option to increase.

Parallelism

You can optionally run multiple tests in parallel by using --parallel option. For example, to run 2 tests in parallel:

python run_openai.py --url https://localhost:11434/v1 --model llama3 --parallel 2

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Ollama_MMLU_Pro.ipynb		Ollama_MMLU_Pro.ipynb
README.md		README.md
requirements.txt		requirements.txt
run_openai.py		run_openai.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama-MMLU-Pro

Usage

Parallelism

About

Releases

Packages

Languages

License

sam-paech/Ollama-MMLU-Pro-IRT

Folders and files

Latest commit

History

Repository files navigation

Ollama-MMLU-Pro

Usage

Parallelism

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages