forked from openai/evals
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'openai:main' into main
- Loading branch information
Showing
211 changed files
with
3,059 additions
and
939 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
import sys | ||
import yaml | ||
|
||
def get_first_key(file_path): | ||
with open(file_path, 'r') as yaml_file: | ||
content = yaml.safe_load(yaml_file) | ||
first_key = next(iter(content)) | ||
return first_key | ||
|
||
if __name__ == "__main__": | ||
yaml_file_path = sys.argv[1] | ||
print(get_first_key(yaml_file_path)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
name: Run new evals | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
check_files: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout repository | ||
uses: actions/checkout@v2 | ||
with: | ||
fetch-depth: 0 | ||
lfs: true | ||
|
||
- name: Install Git LFS | ||
run: | | ||
sudo apt-get install git-lfs | ||
git lfs install | ||
- name: Set up Python | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: 3.9 | ||
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install pyyaml | ||
pip install -e . | ||
- name: Get list of new YAML files in evals/registry/evals | ||
id: get_files | ||
run: | | ||
# Use environment files to store the output | ||
git diff --name-only --diff-filter=A ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | grep '^evals/registry/evals/.*\.yaml$' | xargs > new_files | ||
echo "new_files=$(cat new_files)" >> $GITHUB_ENV | ||
- name: Run oaieval command for each new YAML file | ||
run: | | ||
files="${{ env.new_files }}" | ||
if [ -n "$files" ]; then | ||
for file in $files; do | ||
echo "Processing $file" | ||
first_key=$(python .github/workflows/parse_yaml.py $file) | ||
echo "Eval Name: $first_key" | ||
oaieval dummy $first_key --max_samples 10 | ||
done | ||
else | ||
echo "No new YAML files found in evals/registry/evals" | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
__pycache__/ | ||
evals.egg-info/ | ||
.vscode/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
recursive-include evals *.py | ||
recursive-include evals *.yaml | ||
recursive-include evals *.sql | ||
recursive-include evals/registry/data *.jsonl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
### The Completion Function Protocol | ||
|
||
Here are the interfaces needed to implement the completion function protocol. Any implementation of this interface can be used inside `oaieval`. | ||
|
||
Reference implementations: | ||
- [OpenAICompletionFn](../evals/completion_fns/openai.py) | ||
- [LangChainLLMCompletionFn](../evals/completion_fns/langchain_llm.py) | ||
|
||
#### CompletionFn | ||
Completion functions should implement the `CompletionFn` interface: | ||
```python | ||
class CompletionFn(Protocol): | ||
def __call__( | ||
self, | ||
prompt: Union[str, list[dict[str, str]]], | ||
**kwargs, | ||
) -> CompletionResult: | ||
``` | ||
|
||
We take a `prompt` representing a single sample from an eval. These prompts can be represented as either a text string or a list of messages in [OpenAI Chat format](https://platform.openai.com/docs/guides/chat/introduction). To work with the existing evals, Completion Function implementations would need to handle both types of inputs, but we provide helper functionality to convert Chat formatted messages into a text string if that is the preferred input for your program: | ||
```python | ||
from evals.prompt.base import CompletionPrompt | ||
|
||
# chat_prompt: list[dict[str, str]] -> text_prompt: str | ||
text_prompt = CompletionPrompt(chat_prompt).to_formatted_prompt() | ||
``` | ||
|
||
#### CompletionResult | ||
The completion function should return an object implementing the `CompletionResult` interface: | ||
```python | ||
class CompletionResult(ABC): | ||
@abstractmethod | ||
def get_completions(self) -> list[str]: | ||
pass | ||
``` | ||
The `get_completions` method returns a list of string completions. Each element should be considered a unique completion (in most cases this will be a list of length 1). | ||
|
||
#### Using your CompletionFn | ||
This is all that's needed to implement a Completion function that works with our existing Evals, allowing you to more easily evaluate your end-to-end logic on tasks. | ||
|
||
See [completion-fns.md](completion-fns.md) to see how to register and use your completion function with `oaieval`. |
Oops, something went wrong.