Releases · zetaalphavector/RAGElo

Added parallel calls to OpenAI with asyncio by @ArthurCamara in #21
Change from aiohttp sessions to using OpenAI's Async clients. by @ArthurCamara in #22
Improve batching by @ArthurCamara in #25
Refactor pairwise answer eval by @frejonb in #26
Notebook example by @ArthurCamara in #27

Full Changelog: 0.1.1...0.1.2

Contributors

ArthurCamara and frejonb

Assets 2

16 Apr 10:38

ArthurCamara

0.1.1

ebbd6be

v0.1

RAGElo goes 0.1!

In this release, RAGElo as a library was completely revamped, with a much easier to use unified interface, simpler to use commands (evaluate and batch_evaluate). Now using an Evaluator is a simple as calling evaluator.evaluate("query", "document").

Custom Evaluators and metadata support

Not a fan of the existing evaluators? Now both Retrieval and Answer evaluators support fully custom promptings using the RetrievalEvaluator.CustomPromptEvaluator and AnswerEvaluator.CustomPromptEvaluator, respectively.

As part of the custom evaluators, now RAGElo also supports custom metadata injection into your prompts! Want to include the current timestamp into your evaluator? Add a {today_date} placeholder to the prompt and pass it as a metadata to the evaluate method:

from ragelo import get_retrieval_evaluator

prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
You should pay extra attention to how **recent** a document is. A document older than 5 years is considered outdated.

The answer should be evaluated according tot its recency, truthfulness, and relevance to the user query.

User query: {q}

Retrieved document: {d}

The document has a date of {document_date}.
Today is {today_date}.

WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT WITH THE FOLLOWING KEYS:
- "relevance": 0 if the document is irrelevant, 1 if it is relevant.
- "recency": 0 if the document is outdated, 1 if it is recent.
- "truthfulness": 0 if the document is false, 1 if it is true.
- "reasoning": A short explanation of why you think the document is relevant or irrelevant.
"""

evaluator = get_retrieval_evaluator(
    "custom_prompt", # name of the retrieval evaluator
    llm_provider="openai", # Which LLM provider to use
    prompt=prompt, # your custom prompt
    query_placeholder="q", # the placeholder for the query in the prompt
    document_placeholder="d", # the placeholder for the document in the prompt
    answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
    scoring_keys=["relevance", "recency", "truthfulness", "reasoning"], # Which keys to extract from the answer
)

raw_answer, answer = evaluator.evaluate(
    query="What is the capital of Brazil?", # The user query
    document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
    query_metadata={"today_date": "08-04-2024"}, # Some metadata for the query
    doc_metadata={"document_date": "04-03-1950"}, # Some metadata for the document
)

CLI Interface changes

In the CLI front, each evaluator has its own subprogram now. Instead of calling ragelo with a long list of parameters, you can call ragelo retrieval-evaluator <evaluator> or ragelo answer-evaluator <evaluator> with your preferred evaluator. (We are big fans of the ragelo retrieval-evaluator domain-expert 😉 ).

Other changes:

Moved from using dataclasses to Pydantic's BaseModel. The code should support Pydantic >=0.9, but let us know if it doesn't work for you.
Calling batch_evaluator will now return both the existing and new annotations, instead of only writing to new annotations to a file.
Interface of the batch_evaluator is much simplified. Now, instead of a dictionary of dictionaries, it requires a list of Query, and each query have its own list of documents and answers.
PairwiseAnswerEvaluator is much simplified now. k is the number of games to generate per query, instead of the grand total.
Many specific methods are simplified and moved upper in the class hierarchy. More code sharing and easier to maintain!