Evaluating LLMs on QA Tasks #65

slavakurilyak · 2023-03-14T23:39:25Z

Here's an idea on how to evaluate an LLM on various question-answering tasks, such as open-domain question answering, conversational question answering, answer selection, community question answering, and knowledge base question answering:

initialize model
initialize datasets
initialize evaluation_metrics

load_task_data:
    for each task in tasks:
        load data for task
        preprocess data if necessary (e.g., combine review summary and text)
        store data in datasets

embed_task_data:
    for each task in tasks:
        for each example in datasets[task]:
            obtain prompt from example
            obtain prompt_embedding using an embedding function
            store prompt_embedding in example

evaluate_model_on_task:
    for each task in tasks:
        for each example in datasets[task]:
            obtain prompt_embedding from example
            generate_answer_embedding = model.generate(prompt_embedding)

            calculate_metric = evaluation_metrics(example, generate_answer_embedding)
            store_metric_results_for_task(task)

aggregate_and_report_metrics:
    for each task in tasks:
        for each metric in evaluation_metrics:
            calculate average, median, or other aggregate metric values
            report metric value for task

main:
    load_task_data
    embed_task_data
    evaluate_model_on_task
    aggregate_and_report_metrics

I'd like to add a caveat about the pseudocode I provided:

The provided pseudocode is only a starting point for exploring the evaluation of QA tasks using embeddings
This pseudocode is not complete
I invite the community to provide input

The text was updated successfully, but these errors were encountered:

placcaumuhire · 2023-03-15T01:29:43Z

Hey there! Thanks for sharing your idea on how to evaluate an LLM on various question-answering tasks. I really appreciate your contribution and I think your pseudocode provides a great starting point for exploring and understanding the evaluation process. And you're right, there's always room for improvement, so I encourage you and others to share your thoughts and experiences to help enhance the understanding and implementation of this process. Keep up the good work!

Abhishekagrawal1404 · 2023-03-15T02:41:36Z

quite impressive

ricky-sb · 2023-03-15T05:36:59Z

@slavakurilyak ok I know this is a weird question, but...did you generate this with ChatGPT? 👀

It has a very similar tone. The pseudocode, the disclaimers, the step-by-step thing. It's very similar to when I ask ChatGPT for coding help.

Abhishekagrawal1404 · 2023-03-15T05:46:16Z

Maybe

…

On Wed, 15 Mar, 2023, 11:07 am ricky-sb, ***@***.***> wrote: @slavakurilyak <https://github.com/slavakurilyak> ok I know this is a weird question, but...did you generate this with ChatGPT? 👀 — Reply to this email directly, view it on GitHub <#65 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY2ADJTJH6UB7EAO3CU2WRTW4FIQLANCNFSM6AAAAAAV3CYJAM> . You are receiving this because you commented.Message ID: ***@***.***>

YoshiDeSchrijver · 2023-03-15T08:56:47Z

Id like to contribute.

placcaumuhire · 2023-03-15T09:00:09Z

🤓 Me ✌️ from 🇷🇼

andrew-openai · 2023-03-30T00:24:42Z

The tasks described:

question-answering tasks, such as open-domain question answering, conversational question answering, answer selection, community question answering, and knowledge base question answering:

should already be supported by Evals, as you can make the input a Chat conversation object up until the next turn (which is when the model would respond)

andrew-openai closed this as completed Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluating LLMs on QA Tasks #65

Evaluating LLMs on QA Tasks #65

slavakurilyak commented Mar 14, 2023 •

edited

Loading

placcaumuhire commented Mar 15, 2023

Abhishekagrawal1404 commented Mar 15, 2023

ricky-sb commented Mar 15, 2023 •

edited

Loading

Abhishekagrawal1404 commented Mar 15, 2023 via email

YoshiDeSchrijver commented Mar 15, 2023

placcaumuhire commented Mar 15, 2023

andrew-openai commented Mar 30, 2023

Evaluating LLMs on QA Tasks #65

Evaluating LLMs on QA Tasks #65

Comments

slavakurilyak commented Mar 14, 2023 • edited Loading

placcaumuhire commented Mar 15, 2023

Abhishekagrawal1404 commented Mar 15, 2023

ricky-sb commented Mar 15, 2023 • edited Loading

Abhishekagrawal1404 commented Mar 15, 2023 via email

YoshiDeSchrijver commented Mar 15, 2023

placcaumuhire commented Mar 15, 2023

andrew-openai commented Mar 30, 2023

slavakurilyak commented Mar 14, 2023 •

edited

Loading

ricky-sb commented Mar 15, 2023 •

edited

Loading