Releases: zetaalphavector/RAGElo
v0.1.6
What's Changed
- Fix issue with RDNAM parsing of answer by @matprst in #32
- docs: update README.md by @eltociear in #33
- Elo Ranker returns dictionary with agents scores by @ArthurCamara in #34
New Contributors
- @matprst made their first contribution in #32
- @eltociear made their first contribution in #33
Full Changelog: 0.1.5...0.1.6
v0.1.5
Adds support to Python >= 3.8
What's Changed
- Support Python 3.8 by @ArthurCamara in #29
Full Changelog: 0.1.3...0.1.5
v0.1.4
Hotfix for Python3.10
0.1.2
Main changes:
- OpenAI calls are much faster now and can be done in parallel.
- The pairwise answer evaluations are easier to use and more configurable.
- A new PairwiseExpertAnswerEvaluator evaluator was added.
- Added a notebook with examples of using RAGElo as a library.
What's Changed
- Added parallel calls to OpenAI with asyncio by @ArthurCamara in #21
- Change from aiohttp sessions to using OpenAI's Async clients. by @ArthurCamara in #22
- Improve batching by @ArthurCamara in #25
- Refactor pairwise answer eval by @frejonb in #26
- Notebook example by @ArthurCamara in #27
Full Changelog: 0.1.1...0.1.2
v0.1
RAGElo goes 0.1!
In this release, RAGElo as a library was completely revamped, with a much easier to use unified interface, simpler to use commands (evaluate
and batch_evaluate
). Now using an Evaluator is a simple as calling evaluator.evaluate("query", "document")
.
Custom Evaluators and metadata support
Not a fan of the existing evaluators? Now both Retrieval and Answer evaluators support fully custom promptings using the RetrievalEvaluator.CustomPromptEvaluator
and AnswerEvaluator.CustomPromptEvaluator
, respectively.
As part of the custom evaluators, now RAGElo also supports custom metadata injection into your prompts! Want to include the current timestamp into your evaluator? Add a {today_date}
placeholder to the prompt and pass it as a metadata to the evaluate
method:
from ragelo import get_retrieval_evaluator
prompt = """You are a helpful assistant for evaluating the relevance of a retrieved document to a user query.
You should pay extra attention to how **recent** a document is. A document older than 5 years is considered outdated.
The answer should be evaluated according tot its recency, truthfulness, and relevance to the user query.
User query: {q}
Retrieved document: {d}
The document has a date of {document_date}.
Today is {today_date}.
WRITE YOUR ANSWER ON A SINGLE LINE AS A JSON OBJECT WITH THE FOLLOWING KEYS:
- "relevance": 0 if the document is irrelevant, 1 if it is relevant.
- "recency": 0 if the document is outdated, 1 if it is recent.
- "truthfulness": 0 if the document is false, 1 if it is true.
- "reasoning": A short explanation of why you think the document is relevant or irrelevant.
"""
evaluator = get_retrieval_evaluator(
"custom_prompt", # name of the retrieval evaluator
llm_provider="openai", # Which LLM provider to use
prompt=prompt, # your custom prompt
query_placeholder="q", # the placeholder for the query in the prompt
document_placeholder="d", # the placeholder for the document in the prompt
answer_format="multi_field_json", # The format of the answer. In this case, a JSON object with multiple fields
scoring_keys=["relevance", "recency", "truthfulness", "reasoning"], # Which keys to extract from the answer
)
raw_answer, answer = evaluator.evaluate(
query="What is the capital of Brazil?", # The user query
document="Rio de Janeiro is the capital of Brazil.", # The retrieved document
query_metadata={"today_date": "08-04-2024"}, # Some metadata for the query
doc_metadata={"document_date": "04-03-1950"}, # Some metadata for the document
)
CLI Interface changes
In the CLI front, each evaluator has its own subprogram now. Instead of calling ragelo
with a long list of parameters, you can call ragelo retrieval-evaluator <evaluator>
or ragelo answer-evaluator <evaluator>
with your preferred evaluator. (We are big fans of the ragelo retrieval-evaluator domain-expert
😉 ).
Other changes:
- Moved from using
dataclasses
to Pydantic'sBaseModel
. The code should support Pydantic >=0.9, but let us know if it doesn't work for you. - Calling
batch_evaluator
will now return both the existing and new annotations, instead of only writing to new annotations to a file. - Interface of the
batch_evaluator
is much simplified. Now, instead of a dictionary of dictionaries, it requires a list ofQuery
, and each query have its own list of documents and answers. PairwiseAnswerEvaluator
is much simplified now.k
is the number of games to generate per query, instead of the grand total.- Many specific methods are simplified and moved upper in the class hierarchy. More code sharing and easier to maintain!
Full Changelog: 0.0.5...0.1.0
v0.0.5
What's Changed
Major overhaul to the code!
- More modular
- Tests
- Simpler and more Coherent class interface
- Simpler iterators
- Update OpenAI version
by @ArthurCamara in #7
Full Changelog: 0.0.3...0.0.5
0.0.3
Added a new document evaluator (domain_expert) and a bunch of bugfixes.
What's Changed
- Adding Domain Expert Evaluator by @ArthurCamara in #5
Full Changelog: 0.0.2...0.0.3
0.0.2
First public release of RAGElo, an LLM powered annotator for RAG Agents using an Elo-style tournament