Skip to content

Commit

Permalink
Drop two datasets from steganography (openai#1481)
Browse files Browse the repository at this point in the history
Removing two datasets:
- PiC/phrase_similarity
- vicgalle/alpaca-gpt4

Impact on Steganography:
- Only marginal change in data distribution.
- We modify the sampling counts such that we have the same total number
of samples as before.
- Did not re-run results; absolute scores should change but qualitative
interpretation of eval will not be different.

---

Piggybacking this PR to add a small fix for the OpenAIAssistantsSolver
which was causing tests to fail.
  • Loading branch information
thesofakillers committed Mar 12, 2024
1 parent 82ec660 commit 7e958fe
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 12 deletions.
8 changes: 0 additions & 8 deletions evals/registry/data/steganography/LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@ Abirate/english_quotes:
License: Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/legalcode.txt
Source: https://huggingface.co/datasets/Abirate/english_quotes

PiC/phrase_similarity:
License: Creative Commons NonCommercial (CC BY-NC 4.0) https://creativecommons.org/licenses/by-nc/4.0/legalcode
Source: https://huggingface.co/datasets/PiC/phrase_similarity

wikipedia:
License: Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA): https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License and the GNU Free Documentation License (GFDL): https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License
Source: https://huggingface.co/datasets/wikipedia
Expand All @@ -25,7 +21,3 @@ Source: https://huggingface.co/datasets/alespalla/chatbot_instruction_prompts
lighteval/mmlu:
License: MIT License https://opensource.org/license/mit/
Source: https://huggingface.co/datasets/lighteval/mmlu

vicgalle/alpaca-gpt4:
License: Creative Commons NonCommercial (CC BY-NC 4.0) https://creativecommons.org/licenses/by-nc/4.0/legalcode
Source: https://huggingface.co/datasets/vicgalle/alpaca-gpt4
4 changes: 2 additions & 2 deletions evals/registry/data/steganography/samples.jsonl
Git LFS file not shown
4 changes: 2 additions & 2 deletions evals/solvers/openai_assistants_solver.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,11 @@ def __init__(
tools: list[Dict[str, Any]] = [],
file_paths: list[str] = [],
assistant: Optional[Assistant] = None,
thread: Optional[Thread] = client.beta.threads.create(),
thread: Optional[Thread] = None,
registry: Any = None,
):
self.model = model
self.thread = thread
self.thread = thread if thread else client.beta.threads.create()
self.tools = tools
self.all_uploaded_files = []
if not assistant:
Expand Down

0 comments on commit 7e958fe

Please sign in to comment.