forked from openai/evals
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[evals] Refactor evals package to expose
completion_fn
. (openai#515)
PAIR=jasonwei - Move Evals functionality to use CompletionFns from ModelSpecs. --------- Co-authored-by: Jason Wei <[email protected]> Co-authored-by: Andrew Kondrich <[email protected]> Co-authored-by: Andrew Kondrich <[email protected]> Co-authored-by: Alvin Wang <[email protected]> Co-authored-by: joe-at-openai <[email protected]>
- Loading branch information
1 parent
f7ebbe8
commit 64fb72a
Showing
29 changed files
with
730 additions
and
560 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
recursive-include evals *.py | ||
recursive-include evals *.yaml | ||
recursive-include evals *.sql | ||
recursive-include evals *.jsonl | ||
recursive-include evals/registry/data *.jsonl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
### The Completion Function Protocol | ||
|
||
Here are the interfaces needed to implement the completion function protocol. Any implementation of this interface can be used inside `oaieval`. | ||
|
||
Reference implementations: | ||
- [OpenAICompletionFn](../evals/completion_fns/openai.py) | ||
- [LangChainLLMCompletionFn](../evals/completion_fns/langchain_llm.py) | ||
|
||
#### CompletionFn | ||
Completion functions should implement the `CompletionFn` interface: | ||
```python | ||
class CompletionFn(Protocol): | ||
def __call__( | ||
self, | ||
prompt: Union[str, list[dict[str, str]]], | ||
**kwargs, | ||
) -> CompletionResult: | ||
``` | ||
|
||
We take a `prompt` representing a single sample from an eval. These prompts can be represented as either a text string or a list of messages in [OpenAI Chat format](https://platform.openai.com/docs/guides/chat/introduction). To work with the existing evals, Completion Function implementations would need to handle both types of inputs, but we provide helper functionality to convert Chat formatted messages into a text string if that is the preferred input for your program: | ||
```python | ||
from evals.prompt.base import CompletionPrompt | ||
|
||
# chat_prompt: list[dict[str, str]] -> text_prompt: str | ||
text_prompt = CompletionPrompt(chat_prompt).to_formatted_prompt() | ||
``` | ||
|
||
#### CompletionResult | ||
The completion function should return an object implementing the `CompletionResult` interface: | ||
```python | ||
class CompletionResult(ABC): | ||
@abstractmethod | ||
def get_completions(self) -> list[str]: | ||
pass | ||
``` | ||
The `get_completions` method returns a list of string completions. Each element should be considered a unique completion (in most cases this will be a list of length 1). | ||
|
||
#### Using your CompletionFn | ||
This is all that's needed to implement a Completion function that works with our existing Evals, allowing you to more easily evaluate your end-to-end logic on tasks. | ||
|
||
See [completion-fns.md](completion-fns.md) to see how to register and use your completion function with `oaieval`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Completion Functions | ||
|
||
## What are completion functions | ||
In [run-evals.md](run-evals.md), we learned how to make calls to `oaieval` to run an eval against a completion function. Completion Functions are generalizations of model completions, where a "completion" is some text output that would be our answer to the prompt. For example, if "Who played the girl elf in the hobbit?" is our prompt, the correct completion is "Evangeline Lilly". While we can just test a model directly to see if it generates "Evangeline Lilly", we can imagine doing numerous other operations under the hood to improve our ability to answer this question, like giving the model access to a browser to look up the answer before responding. Making it easy to implement this kind of under-the-hood operators before responding is the motivation behind building Completion Functions. | ||
|
||
## How to implement completion functions | ||
A completion function needs to implement some interfaces that make it usable within Evals. At its core, it is just standardizing inputs to be a text string or [Chat conversation](https://platform.openai.com/docs/guides/chat), and the output to be a list of text strings. Implementing this interface will allow you to run your Completion Function against any eval in Evals. | ||
|
||
The exact interfaces needed are described in detail in [completion-fn-protocol.md](completion-fn-protocol.md) | ||
|
||
We include some example implementations inside `evals/completion_fns`. For example, the [`LangChainLLMCompletionFn`](../evals/completion_fns/langchain_llm.py) implements a way to generate completions from [LangChain LLMs](https://python.langchain.com/en/latest/modules/models/llms/getting_started.html). We can then use these completion functions with `oaieval`: | ||
``` | ||
oaieval langchain/llm/flan-t5-xl test-match | ||
``` | ||
|
||
## Registering Completion Functions | ||
Once you have written a completion function, we need to make the class visible to the `oaieval` CLI. Similar to how we register our evals, we also register Completion Functions inside `evals/registry/completion_fns` as `yaml` files. Here is the registration for our langchain LLM completion function: | ||
```yaml | ||
langchain/llm/flan-t5-xl: | ||
class: evals.completion_fns.langchain_llm:LangChainLLMCompletionFn | ||
args: | ||
llm: HuggingFaceHub | ||
llm_kwargs: | ||
repo_id: google/flan-t5-xl | ||
``` | ||
Here is how it breaks down | ||
`langchain/llm/flan-t5-xl`: This is the top level key that will be used to access this completion function with `oaieval`. | ||
`class`: This is the path to your implementation of the completion function protocol. This class needs to importable within your python environment. | ||
`args`: These are arguments that are passed to your completion function when it is instantiated. | ||
|
||
|
||
### Developing Completion Functions outside of Evals | ||
It is possible to register CompletionFunctions without directly modifying the registry or code inside `Evals` by using the `--registry_path` argument. As an example, let's say I want to use `MyCompletionFn` located inside `~/my_project/`: | ||
``` | ||
my_project | ||
├── my_completion_fn.py | ||
└── completion_fns | ||
└── my_completion_fn.yaml | ||
``` | ||
|
||
If `my_project` is importable within the python environment (accessible via PYTHONPATH), we can structure `my_completion_fn.yaml` as: | ||
``` | ||
my_completion_fn: | ||
class: my_project.my_completion_fn:MyCompletionFn | ||
``` | ||
Then, we can make calls to `oaieval` using: | ||
``` | ||
oaieval my_completion_fn test-match --registry_path ~/my_project | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,8 @@ | ||
from .api import check_sampled_text, completion_query, sample_freeform | ||
from .base import ModelSpec, ModelSpecs | ||
from .api import CompletionFn, CompletionResult, DummyCompletionFn, record_and_check_match | ||
from .completion_fns.openai import ( | ||
OpenAIChatCompletionFn, | ||
OpenAICompletionFn, | ||
OpenAICompletionResult, | ||
) | ||
from .data import get_csv, get_json, get_jsonl, get_jsonls, get_lines, iter_jsonls | ||
from .eval import Eval |
Oops, something went wrong.