Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to eval output with ideal_answer directly without having to define the completion_fn ? #1342

Open
liuyaox opened this issue Aug 29, 2023 · 1 comment

Comments

@liuyaox
Copy link

liuyaox commented Aug 29, 2023

Describe the feature or improvement you're requesting

I have already had the output (generated from LLM) and ideal_answers in my jsonl file. For a look:

{'input': 'what is 2 plus 1?', 'output': '3', 'ideal': '3'}
{'input': 'what is 2 plus 2?', 'output': '3', 'ideal': '4'}

I don't need to define the completion_fn, because it's used to generate output which I have already had.
So, how can I eval output with ideal_answer directly ?
Thanks a lot.

Additional context

No response

@douglasmonsky
Copy link
Contributor

douglasmonsky commented Sep 12, 2023

Hey @liuyaox,

I'm not entirely sure if I've grasped your question accurately, but I'll endeavor to provide the best assistance possible. I am assuming this is intended for your personal use case on a fork of the repo, and not with an aim to contribute to the main repository. The guidelines will be virtually the same in either case, but I will not delve deeply into contribution conventions in this response for the sake of brevity. You can find more detailed information in the documentation.

To run this and obtain an evaluation score based on the models' responses, follow these steps:

  1. Navigate to evals/registry/data and create a new folder; considering your examples, you might name it basic_math.
  2. Inside this folder, place your jsonl file. Although it can be named anything, let's assume you choose sample.jsonl.

NOTE: Please record the folder name and file name as they will be required shortly.

  1. Proceed to evals/registry/evals and create a new yaml file, naming it according to your preference. In this instance, let's use basic_math.yaml.
  2. Populate your yaml file with the necessary configurations. Here is a simplified match template for your reference. Additional details can be found here.
basic_math:
  id: basic_math.dev.v0
  description: Test the model's ability to perform basic math operations.
  metrics: [accuracy]

basic_math.dev.v0:
  class: evals.elsuite.basic.match:Match
  args:
    samples_jsonl: basic_math/samples.jsonl  # Note the format here: <foldername>/<filename>.jsonl
  1. You can now execute an evaluation using the oaieval command from the CLI. Find more details here.
    Use the following template:
oaieval <model you want to test> <eval name>

In your scenario, it would be:

oaieval gpt-3.5-turbo basic_math

Provided your environment is configured correctly and all files are correctly placed, executing the above command should initiate the evaluation process.

Lastly, please direct future inquiries of this nature to the Discussion Tab, as it is a more appropriate platform for seeking guidance on understanding or running the repo, where as this tab is meant for reporting implementation issues. You're more likely to receive a response to your question there. I hope this assists you, and I am here for any further queries you may have!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants