Port evaluation from 1.x and extend to LLMs #6672

julian-risch · 2024-01-02T11:08:08Z

We need to extend evaluation features, in particular for RAG pipelines so that users can answer questions like:

Is this pipeline good enough?
What should I focus on for optimization?
Is pipeline A better than B? (performance, costs, latency)

This includes the following components typically appearing in RAG pipelines:
Retrievers, Rankers, DocumentJoiners
a) labels available => statistical metrics
b) no labels available => model based heuristics / pseudo label generator

Generators
a) labels available = model based (SAS, answer correctness ...)
b) no labels = model based (groundedness score)

Tasks

Give feedback

Implement statistical-based evaluation and metrics calculation #6061

19 of 19

2.x epic topic:eval type:feature
LLM Evaluation in Haystack #6786

13 of 14

2.x P1 epic
Options

julian-risch added epic 2.x Related to Haystack v2.0 labels Jan 2, 2024

julian-risch self-assigned this Jan 3, 2024

masci assigned shadeMe and TuanaCelik Jan 5, 2024

julian-risch mentioned this issue Jan 24, 2024

Enable basic evaluation of generative models #4546

Closed

masci changed the title ~~Extend evaluation to LLMs~~ Port evaluation from 1.x and extend to LLMs Mar 12, 2024

masci closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port evaluation from 1.x and extend to LLMs #6672

Port evaluation from 1.x and extend to LLMs #6672

julian-risch commented Jan 2, 2024 •

edited by masci

Loading

Tasks

Port evaluation from 1.x and extend to LLMs #6672

Port evaluation from 1.x and extend to LLMs #6672

Comments

julian-risch commented Jan 2, 2024 • edited by masci Loading

Tasks

julian-risch commented Jan 2, 2024 •

edited by masci

Loading