Skip to content
 
OpenCompass Website HOT      OpenCompass Toolkit TRY IT OUT
 

GitHub Org's stars

What is OpenCompass ? OpenCompass is a platform focused on understanding of the AGI, include Large Language Model and Multi-modality Model.

We aim to:

  • develop high-quality libraries to reduce the difficulties in evaluation
  • provide convincing leaderboards for improving the understanding of the large models
  • create powerful toolchains targeting a variety of abilities and tasks
  • build solid benchmarks to support the large model research
  • research on inference of Large Model(analysis, reasoning, prompt engineering.)

Toolkit

OpenCompass

VLMEvalKit

Benchmarks and Methods

Project Topic Paper

DevBench

Automated Software Development

DevBench: Towards LLMs based Automated Software Development

CriticBench

Critic Reasoning

CriticBench: Evaluating Large Language Models as Critic

ANAH

Hallucination Annotation

ANAH: Analytical Annotation of Hallucinations in Large Language Models

MathBench

Mathematical Reasoning

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

T-Eval

Tool Utilization

T-Eval: Evaluating the Tool Utilization Capability Step by Step

MMBench

Multi Modality

MMBench: Is Your Multi-modal Model an All-around Player?

BotChat

Subjective Evaluation

BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues

LawBench

Domain Evaluation

LawBench: Benchmarking Legal Knowledge of Large Language Models

Pinned Loading

  1. opencompass opencompass Public

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    Python 3.1k 331

  2. VLMEvalKit VLMEvalKit Public

    Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

    Python 620 70

  3. LawBench LawBench Public

    Benchmarking Legal Knowledge of Large Language Models

    Python 203 28

  4. T-Eval T-Eval Public

    [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step

    Python 175 11

  5. Ada-LEval Ada-LEval Public

    The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"

    Python 44 2

  6. GAOKAO-Eval GAOKAO-Eval Public

    Jupyter Notebook 72 6

Repositories

Showing 10 of 21 repositories
  • VLMEvalKit Public

    Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

    open-compass/VLMEvalKit’s past year of commit activity
    Python 620 Apache-2.0 70 13 4 Updated Jun 29, 2024
  • opencompass Public

    OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

    open-compass/opencompass’s past year of commit activity
    Python 3,120 Apache-2.0 331 123 (1 issue needs help) 24 Updated Jun 28, 2024
  • CIBench Public

    Official Repo of "CIBench: Evaluation of LLMs as Code Interpreter "

    open-compass/CIBench’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Jun 26, 2024
  • open-compass/GAOKAO-Eval’s past year of commit activity
    Jupyter Notebook 72 6 0 0 Updated Jun 19, 2024
  • MMBench Public

    Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"

    open-compass/MMBench’s past year of commit activity
    115 Apache-2.0 5 1 0 Updated Jun 17, 2024
  • ANAH Public

    [ACL 2024] ANAH: Analytical Annotation of Hallucinations in Large Language Models

    open-compass/ANAH’s past year of commit activity
    Python 9 Apache-2.0 0 0 0 Updated May 31, 2024
  • .github Public
    open-compass/.github’s past year of commit activity
    0 1 0 0 Updated May 31, 2024
  • MathBench Public

    [ACL 2024 Findings] MathBench: A Comprehensive Multi-Level Difficulty Mathematics Evaluation Dataset

    open-compass/MathBench’s past year of commit activity
    63 Apache-2.0 1 3 0 Updated May 30, 2024
  • DevBench Public

    A Comprehensive Benchmark for Software Development.

    open-compass/DevBench’s past year of commit activity
    Python 78 Apache-2.0 4 1 0 Updated May 30, 2024
  • open-compass/CodeBench’s past year of commit activity
    1 0 0 0 Updated May 21, 2024