Skip to content

Releases: zetaalphavector/RAGElo

v0.1.6

02 Jul 14:58
91d7bd3
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 0.1.5...0.1.6

v0.1.5

31 May 14:02
d868c81
Compare
Choose a tag to compare

Adds support to Python >= 3.8

What's Changed

Full Changelog: 0.1.3...0.1.5

v0.1.4

31 May 11:44
6e7cc2a
Compare
Choose a tag to compare

Hotfix for Python3.10

0.1.2

31 May 10:07
72a1189
Compare
Choose a tag to compare

Main changes:

  • OpenAI calls are much faster now and can be done in parallel.
  • The pairwise answer evaluations are easier to use and more configurable.
  • A new PairwiseExpertAnswerEvaluator evaluator was added.
  • Added a notebook with examples of using RAGElo as a library.

What's Changed

Full Changelog: 0.1.1...0.1.2

v0.1

16 Apr 10:38
ebbd6be
Compare
Choose a tag to compare

RAGElo goes 0.1!

In this release, RAGElo as a library was completely revamped, with a much easier to use unified interface, simpler to use commands (evaluate and batch_evaluate). Now using an Evaluator is a simple as calling evaluator.evaluate("query", "document").

Custom Evaluators and metadata support

Not a fan of the existing evaluators? Now both Retrieval and Answer evaluators support fully custom promptings using the RetrievalEvaluator.CustomPromptEvaluator and AnswerEvaluator.CustomPromptEvaluator, respectively.

As part of the custom evaluators, now RAGElo also supports custom metadata injection into your prompts! Want to include the current timestamp into your evaluator? Add a {today_date} placeholder to the prompt and pass it as a metadata to the evaluate method:

from ragelo import get_retrieval_evaluator

prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
You should pay extra attention to how **recent** a document is. A document older than 5 years is considered outdated.

The answer should be evaluated according tot its recency, truthfulness, and relevance to the user query.

User query: {q}

Retrieved document: {d}

The document has a date of {document_date}.
Today is {today_date}.

WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT WITH THE FOLLOWING KEYS:
- "relevance": 0 if the document is irrelevant, 1 if it is relevant.
- "recency": 0 if the document is outdated, 1 if it is recent.
- "truthfulness": 0 if the document is false, 1 if it is true.
- "reasoning": A short explanation of why you think the document is relevant or irrelevant.
"""

evaluator = get_retrieval_evaluator(
    "custom_prompt", # name of the retrieval evaluator
    llm_provider="openai", # Which LLM provider to use
    prompt=prompt, # your custom prompt
    query_placeholder="q", # the placeholder for the query in the prompt
    document_placeholder="d", # the placeholder for the document in the prompt
    answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
    scoring_keys=["relevance", "recency", "truthfulness", "reasoning"], # Which keys to extract from the answer
)

raw_answer, answer = evaluator.evaluate(
    query="What is the capital of Brazil?", # The user query
    document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
    query_metadata={"today_date": "08-04-2024"}, # Some metadata for the query
    doc_metadata={"document_date": "04-03-1950"}, # Some metadata for the document
)

CLI Interface changes

In the CLI front, each evaluator has its own subprogram now. Instead of calling ragelo with a long list of parameters, you can call ragelo retrieval-evaluator <evaluator> or ragelo answer-evaluator <evaluator> with your preferred evaluator. (We are big fans of the ragelo retrieval-evaluator domain-expert 😉 ).

Other changes:

  • Moved from using dataclasses to Pydantic's BaseModel. The code should support Pydantic >=0.9, but let us know if it doesn't work for you.
  • Calling batch_evaluator will now return both the existing and new annotations, instead of only writing to new annotations to a file.
  • Interface of the batch_evaluator is much simplified. Now, instead of a dictionary of dictionaries, it requires a list of Query, and each query have its own list of documents and answers.
  • PairwiseAnswerEvaluator is much simplified now. k is the number of games to generate per query, instead of the grand total.
  • Many specific methods are simplified and moved upper in the class hierarchy. More code sharing and easier to maintain!

Full Changelog: 0.0.5...0.1.0

v0.0.5

15 Feb 15:49
27cc16c
Compare
Choose a tag to compare

What's Changed

Major overhaul to the code!

  • More modular
  • Tests
  • Simpler and more Coherent class interface
  • Simpler iterators
  • Update OpenAI version

by @ArthurCamara in #7

Full Changelog: 0.0.3...0.0.5

0.0.3

25 Oct 15:06
c4c785a
Compare
Choose a tag to compare

Added a new document evaluator (domain_expert) and a bunch of bugfixes.

What's Changed

Full Changelog: 0.0.2...0.0.3

0.0.2

23 Oct 13:38
8310c85
Compare
Choose a tag to compare

First public release of RAGElo, an LLM powered annotator for RAG Agents using an Elo-style tournament