OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 3,402 357 Updated Jul 26, 2024

dongrixinyu / JioNLP

中文 NLP 预处理、解析工具包，准确、高效、易用 A Chinese NLP Preprocessing & Parsing Package www.jionlp.com

Python 3,165 380 Updated Jul 24, 2024

fossabot / clash

A rule based proxy in Go.

Go 843 6,947 Updated Jul 26, 2024

mudler / LocalAI

🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…

C++ 22,162 1,693 Updated Jul 27, 2024

promptfoo / promptfoo

Test your prompts, agents, and RAGs. Redteaming, pentesting, vulnerability scanning for LLMs. Improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and m…

TypeScript 3,716 265 Updated Jul 28, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,748 3,410 Updated Jul 28, 2024

OpenLMLab / LEval

[ACL'24 Oral] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Python 316 13 Updated Jul 9, 2024

WooooDyy / LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

5,865 352 Updated Jul 14, 2024

google-research-datasets / conceptual-captions

Conceptual Captions is a dataset containing (image-URL, caption) pairs designed for the training and evaluation of machine learned image captioning systems.

Shell 506 25 Updated Aug 21, 2021

persimmon-ai-labs / adept-inference

Inference code for Persimmon-8B

Python 414 23 Updated Sep 9, 2023

LiLittleCat / awesome-free-chatgpt

🆓免费的 ChatGPT 镜像网站列表，持续更新。List of free ChatGPT mirror sites, continuously updated.

Python 17,305 1,210 Updated Jul 21, 2024

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,111 711 Updated Jul 16, 2024

jinlanfu / GPTScore

Source Code of Paper "GPTScore: Evaluate as You Desire"

Python 216 14 Updated Feb 21, 2023

explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines

Python 5,928 561 Updated Jul 27, 2024

LC1332 / Chat-Haruhi-Suzumiya

Chat凉宫春日, An open sourced Role-Playing chatbot Cheng Li, Ziang Leng, and others.

Jupyter Notebook 1,716 154 Updated Apr 4, 2024

modAL-python / modAL

A modular active learning framework for Python

Python 2,180 317 Updated Feb 26, 2024

vercel / vercel

Develop. Preview. Ship.

TypeScript 12,464 2,144 Updated Jul 26, 2024

thunlp / ChatEval

Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"

Python 213 13 Updated Apr 15, 2024

zhenbench / z-bench

Z-Bench 1.0 by 真格基金：一个麻瓜的大语言模型中文测试集。Z-Bench is a LLM prompt dataset for non-technical users, developed by an enthusiastic AI-focused team in Zhenfund.

471 41 Updated Jun 28, 2023

yaodongC / awesome-instruction-dataset

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,046 58 Updated Jan 4, 2024

FranxYao / chain-of-thought-hub

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting

Jupyter Notebook 2,453 122 Updated Apr 22, 2024

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 14,443 2,557 Updated Jul 21, 2024

IBM / Dromedary

Dromedary: towards helpful, ethical and reliable LLMs.

Python 1,103 84 Updated Oct 26, 2023

Previous Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sisterdong

Block or report sisterdong

Lists (1)

👀 Interviews

Stars

duoergun0729 / nlp

jgm / pandoc

bloomberg / memray

Kensuke-Hinata / statistic

oap-project / raydp

google / space

phfaist / pylatexenc

open-compass / opencompass