-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[evals] Refactor evals package to expose completion_fn
.
#515
Merged
Merged
Changes from 21 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
d87a056
[evals] Refactor evals package to expose `completion_fn`.
hwchung27 d9c1395
Add `record_raw_samples`
hwchung27 a1c6207
Andrew/evals refactor (#579)
andrew-openai deb29d3
update manifest and pyproject to support fetching data on pip install…
andrew-openai 9b1c350
we need to still use the interop for string/list[dicts] for modelgrad…
andrew-openai c470d52
refactor simple evals to not use result.prompt (#593)
andrew-openai b691cfa
Clean up duplicate recordings
hwchung27 7266049
Replace ModelSpecs with CompletionFn (#594)
jwang47 b2a45cf
Add --registry_path CLI arg (#601)
jwang47 924d2d4
Andrew/langchain llms (#602)
andrew-openai 4401cce
rm sample freeform, some docs (#603)
andrew-openai 013d636
Update completion-fn-protocol.md
andrew-openai 08062bc
some documentation cleanup
joe-at-openai 3367006
some documentation cleanup
joe-at-openai 5e71a76
some documentation cleanup
joe-at-openai e621b6f
inner monologue example (#610)
andrew-openai 49d17ed
Update README.md
andrew-openai 1bfba77
Update run-evals.md
andrew-openai b018aff
cleanup
andrew-openai 5222f2c
Merge branch 'main' into evals_refactor_merge_main
andrew-openai 9db703d
get oaieval to run
andrew-openai 02bc2cb
address comments
andrew-openai 50114a5
bump version
andrew-openai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
recursive-include evals *.py | ||
recursive-include evals *.yaml | ||
recursive-include evals *.sql | ||
recursive-include evals/registry/data *.jsonl | ||
recursive-include evals *.jsonl | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
### The Completion Function Protocol | ||
|
||
Here are the interfaces needed to implement the completion function protocol. Any implementation of this interface can be used inside `oaieval`. | ||
|
||
Reference implementations: | ||
- [OpenAICompletionFn](../evals/completion_fns/openai.py) | ||
- [LangChainLLMCompletionFn](../evals/completion_fns/langchain_llm.py) | ||
|
||
#### CompletionFn | ||
Completion functions should implement the `CompletionFn` interface: | ||
```python | ||
class CompletionFn(Protocol): | ||
def __call__( | ||
self, | ||
prompt: Union[str, list[dict[str, str]]], | ||
**kwargs, | ||
) -> CompletionResult: | ||
``` | ||
|
||
We take a `prompt` representing a single sample from an eval. These prompts can be represented as either a text string or a list of messages in [OpenAI Chat format](https://platform.openai.com/docs/guides/chat/introduction). To work with the existing evals, Completion Function implementations would need to handle both types of inputs, but we provide helper functionality to convert Chat formatted messages into a text string if that is the preferred input for your program: | ||
```python | ||
from evals.prompt.base import CompletionPrompt | ||
|
||
# chat_prompt: list[dict[str, str]] -> text_prompt: str | ||
text_prompt = CompletionPrompt(chat_prompt).to_formatted_prompt() | ||
``` | ||
|
||
#### CompletionResult | ||
The completion function should return an object implementing the `CompletionResult` interface: | ||
```python | ||
class CompletionResult(ABC): | ||
@abstractmethod | ||
def get_completions(self) -> list[str]: | ||
pass | ||
``` | ||
The `get_completions` method returns a list of string completions. Each element should be considered a unique completion (in most cases this will be a list of length 1). | ||
|
||
#### Using your CompletionFn | ||
This is all that's needed to implement a Completion function that works with our existing Evals, allowing you to more easily evaluate your end-to-end logic on tasks. | ||
|
||
See [completion-fns.md](completion-fns.md) to see how to register and use your completion function with `oaieval`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# Completion Functions | ||
|
||
## What are completion functions | ||
In [run-evals.md](run-evals.md), we learned how to make calls to `oaieval` to run an eval against a completion function. Completion Functions are generalizations of model completions, where a "completion" is some text output that would be our answer to the prompt. For example, if "Who played the girl elf in the hobbit?" is our prompt, the correct completion is "Evangeline Lilly". While we can just test a model directly to see if it generates "Evangeline Lilly", we can imagine doing numerous other operations under the hood to improve our ability to answer this question, like giving the model access to a browser to look up the answer before responding. Making it easy to implement this kind of under-the-hood operators before responding is the motivation behind building Completion Functions. | ||
|
||
## How to implement completion functions | ||
A completion function needs to implement some interfaces that make it usable within Evals. At its core, it is just standardizing inputs to be a text string or [Chat conversation](https://platform.openai.com/docs/guides/chat), and the output to be a list of text strings. Implementing this interface will allow you to run your Completion Function against any eval in Evals. | ||
|
||
The exact interfaces needed are described in detail in [completion-fn-protocol.md](completion-fn-protocol.md) | ||
|
||
We include some example implementations inside `evals/completion_fns`. For example, the [`LangChainLLMCompletionFn`](../evals/completion_fns/langchain_llm.py) implements a way to generate completions from [LangChain LLMs](https://python.langchain.com/en/latest/modules/models/llms/getting_started.html). We can then use these completion functions with `oaieval`: | ||
``` | ||
oaieval langchain/llm/flan-t5-xl test-match | ||
``` | ||
|
||
## Registering Completion Functions | ||
Once you have written a completion function, we need to make the class visible to the `oaieval` CLI. Similar to how we register our evals, we also register Completion Functions inside `evals/registry/completion_fns` as `yaml` files. Here is the registration for our langchain LLM completion function: | ||
```yaml | ||
langchain/llm/flan-t5-xl: | ||
class: evals.completion_fns.langchain_llm:LangChainLLMCompletionFn | ||
args: | ||
llm: HuggingFaceHub | ||
llm_kwargs: | ||
repo_id: google/flan-t5-xl | ||
``` | ||
Here is how it breaks down | ||
`langchain/llm/flan-t5-xl`: This is the top level key that will be used to access this completion function with `oaieval`. | ||
`class`: This is the path to your implementation of the completion function protocol. This class needs to importable within your python environment. | ||
`args`: These are arguments that are passed to your completion function when it is instantiated. | ||
|
||
|
||
### Developing Completion Functions outside of Evals | ||
It is possible to register CompletionFunctions without directly modifying the registry or code inside `Evals` by using the `--registry_path` argument. As an example, let's say I want to use `MyCompletionFn` located inside `~/my_project/`: | ||
``` | ||
my_project | ||
├── my_completion_fn.py | ||
└── completion_fns | ||
└── my_completion_fn.yaml | ||
``` | ||
|
||
If `my_project` is importable within the python environment (accessible via PYTHONPATH), we can structure `my_completion_fn.yaml` as: | ||
``` | ||
my_completion_fn: | ||
class: my_project.my_completion_fn:MyCompletionFn | ||
``` | ||
Then, we can make calls to `oaieval` using: | ||
``` | ||
oaieval my_completion_fn test-match --registry_path ~/my_project | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,8 @@ | ||
from .api import check_sampled_text, completion_query, sample_freeform | ||
from .base import ModelSpec, ModelSpecs | ||
from .api import CompletionFn, CompletionResult, DummyCompletionFn, record_and_check_match | ||
from .completion_fns.openai import ( | ||
OpenAIChatCompletionFn, | ||
OpenAICompletionFn, | ||
OpenAICompletionResult, | ||
) | ||
from .data import get_csv, get_json, get_jsonl, get_jsonls, get_lines, iter_jsonls | ||
from .eval import Eval |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably don't need this, can just keep the previous line:
recursive-include evals/registry/data *.jsonl