Ensembled Perplexity API

Perplexity is a measure of the likelihood of a given string, usually as judged by the probabilities output by a language model. For more details, see this HuggingFace explanation of perplexity.

Many scenarios rely on having a good estimate of the "readability" or "likelihood" of a given string, to which perplexity is often the default metric to turn to. However, the perplexity calculated from the logits of a single model are often quite noisy, prone to the specifics of the training data ingested into the model.

Ensembled perplexity is more robust. By querying multiple language models for their respective logits, ensembled perplexity averages away the individual noise within each model, giving a more human interpretable metric of text likelihood.

Overview

This repo implements a Python and HTTP API to calculate perplexity for an ensemble of language models.

Each language model is hosted using VLLM and distributed to a dedicated GPU on a separate process for maximum throughput. A central controller script distributes input prompts to each GPU, and colates the results back through the API.

Installation

To get started, first create a conda enviroment:

conda create -n ensembled-perplexity python=3.10

Then, clone the repo and install the necessary required packages:

git clone https://github.com/haizelabs/ensembled-perplexity.git
cd ensembled-perplexity
pip install -r requirements.txt

Start the API server with

uvicorn perplexity_server:app

Navigate to localhost:8000 to access the API and documentation.

Hosted API Access

If you'd like to avoid the difficulty of provisioning and managing GPU infrastructure, Haize Labs maintains a dedicated cluster for ensembled perplexity.

Send us an email at [email protected] and we'd be happy to grant you access.

Performance

Below are performance stats running lmsys/vicuna-7b-v1.5, meta-llama/Llama-2-7b-hf, microsoft/phi-2, mistralai/Mistral-7B-v0.1 on 4x L4 24GB GPUs.

Number of 50 token prompts	Runtime (seconds)
1	0.38
10	0.48
100	2.16
1000	19.95

Quantized models may offer better performance on low memory GPUs.

Contact

If you have any questions, comments, or would like to get in touch, reach out to [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
architecture.svg		architecture.svg
ensemble_perplexity.py		ensemble_perplexity.py
perplexity_model.py		perplexity_model.py
perplexity_server.py		perplexity_server.py
requirements.txt		requirements.txt
start_single_perplexity_model.py		start_single_perplexity_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensembled Perplexity API

Overview

Installation

Hosted API Access

Performance

Contact

About

Releases

Packages

Languages

haizelabs/ensembled-perplexity

Folders and files

Latest commit

History

Repository files navigation

Ensembled Perplexity API

Overview

Installation

Hosted API Access

Performance

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages