[Feature]: expose running evaluators via API to playground #1956

aybruhm · 2024-08-01T20:37:42Z

Description

This PR exposes the ability to run evaluators via an API.

Evaluators that have been tested

The following evaluators have been tested by the backend tests (and from the UI):

exact match
similarity match
regex test
webhook test
AI critique
starts with
contains
contains any
contains all
contains JSON
JSON diff
Levenshtein distance
RAG faithfulness
RAG context relevancy

The following evaluators have only been tested from the UI:

field match
custom code

What to QA

The QA process should involve running the evaluators mentioned above from the UI.

Related Issue

Closes AGE-491

…d or created

vercel · 2024-08-01T20:37:46Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
agenta	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 29, 2024 8:25am
agenta-documentation	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 29, 2024 8:25am

…ors-via-api-to-playground

…mpose'

…evaluator interface and modified endpoint to evaluate llm app run

…lers

agenta-backend/agenta_backend/models/api/evaluation_model.py

agenta-backend/agenta_backend/services/evaluators_service.py

…ors-via-api-to-playground

- Removed `requires_llm_api_keys` from evaluators that don't require LLM API keys - Ensured evaluators requiring LLM keys have `requires_llm_api_keys` set to `True` by default

…profile

…es to EvaluatorInputInterface

…loat for ai critique evaluator

…hen-we-send-a-dict-to-a-str-only

…il-gracefully-when-we-send-a-dict-to-a-str-only [Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator

…i-to-playground' into feature/age-532-poc-1e-add-llm-api-key-checks-in-llm-based-evaluators

…m-api-key-checks-in-llm-based-evaluators [Enhancement] Add LLM API key checks to LLM-based evaluators

aybruhm added 3 commits August 1, 2024 21:26

feat (backend): add utility function to ensure event loop is retrieve…

5c5ad15

…d or created

feat (backend): add endpoint to run evaluation on a specific evaluator

60064ef

refactor (backend): make use of ensure_event_loop utility function

0a66e26

aybruhm changed the base branch from rag to main August 1, 2024 20:40

aybruhm changed the title ~~Feature/age 491 poc 1e expose running evaluators via api to playground~~ [Feature][DRAFT]: expose running evaluators via API to playground Aug 1, 2024

aybruhm added 2 commits August 2, 2024 14:48

Merge branch 'rag' into feature/age-491-poc-1e-expose-running-evaluat…

7d534fc

…ors-via-api-to-playground

docs (backend): improve docstring in ensure_event_loop function

9ba496f

aybruhm temporarily deployed to oss August 2, 2024 13:49 — with GitHub Actions Inactive

aybruhm changed the base branch from main to rag August 2, 2024 13:50

vercel bot deployed to Preview August 2, 2024 13:52 View deployment

aybruhm changed the title ~~[Feature][DRAFT]: expose running evaluators via API to playground~~ [Feature]: expose running evaluators via API to playground Aug 2, 2024

minor refactor (build): replace use of 'docker-compose' to 'docker co…

bd9b2d3

…mpose'

aybruhm temporarily deployed to oss August 4, 2024 11:29 — with GitHub Actions Inactive

vercel bot deployed to Preview August 4, 2024 11:30 View deployment

aybruhm requested a review from jp-agenta August 4, 2024 11:36

aybruhm added 3 commits August 8, 2024 21:44

feat (backend): created evaluator mapping and input interfaces

3f6b507

feat (backend): implemented endpoints to map experiment data tree to …

4a1604d

…evaluator interface and modified endpoint to evaluate llm app run

refactor (backend): update evaluator handlers to make use of new hand…

55c727e

…lers

vercel bot deployed to Preview August 8, 2024 20:48 View deployment

jp-agenta reviewed Aug 9, 2024

View reviewed changes

agenta-backend/agenta_backend/models/api/evaluation_model.py Outdated Show resolved Hide resolved

jp-agenta reviewed Aug 9, 2024

View reviewed changes

agenta-backend/agenta_backend/models/api/evaluation_model.py Outdated Show resolved Hide resolved

jp-agenta reviewed Aug 9, 2024

View reviewed changes

agenta-backend/agenta_backend/models/api/evaluation_model.py Outdated Show resolved Hide resolved

jp-agenta reviewed Aug 9, 2024

View reviewed changes

agenta-backend/agenta_backend/services/evaluators_service.py Outdated Show resolved Hide resolved

Merge branch 'rag' into feature/age-491-poc-1e-expose-running-evaluat…

165f7a7

…ors-via-api-to-playground

aybruhm temporarily deployed to oss August 9, 2024 09:51 — with GitHub Actions Inactive

vercel bot deployed to Preview August 9, 2024 09:53 View deployment

aybruhm and others added 5 commits August 22, 2024 01:04

refactor (backend): clean up LLM key checks in evaluators

33e6e17

- Removed `requires_llm_api_keys` from evaluators that don't require LLM API keys - Ensured evaluators requiring LLM keys have `requires_llm_api_keys` set to `True` by default

chore (tests): add '@pytest.mark.asyncio' to test cases in test_user_…

7c28f6d

…profile

Enforce in Union[str, Dict[str, Any]] in BaseResponse in SDK

3cad5db

fix ai critique

91d23d8

minor refactor (backend): include ai_critique evaluator settings_valu…

31d10e1

…es to EvaluatorInputInterface

aybruhm temporarily deployed to oss August 23, 2024 12:32 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 12:34 View deployment

chore (style): format evaluators_service with [email protected]

b224f10

aybruhm temporarily deployed to oss August 23, 2024 12:36 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 12:37 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 12:40 View deployment

aybruhm and others added 5 commits August 23, 2024 14:32

minor refactor (backend): resolve ValueError when casting string to f…

ca81cea

…loat for ai critique evaluator

Merge branch 'main' of github.com:Agenta-AI/agenta

6238cd8

Merge branch 'main' into feature/age-573-evaluators-fail-gracefully-w…

f3546ef

…hen-we-send-a-dict-to-a-str-only

fix exception message and bump SDK out of pre-release

2402f94

Merge pull request #1987 from Agenta-AI/feature/age-573-evaluators-fa…

532a4bb

…il-gracefully-when-we-send-a-dict-to-a-str-only [Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator

jp-agenta temporarily deployed to oss August 23, 2024 14:29 — with GitHub Actions Inactive

vercel bot deployed to Preview – agenta-documentation August 23, 2024 14:30 View deployment

vercel bot deployed to Preview – agenta August 23, 2024 14:33 View deployment

aybruhm and others added 3 commits August 26, 2024 10:31

Merge branch 'feature/age-491-poc-1e-expose-running-evaluators-via-ap…

0ce0022

…i-to-playground' into feature/age-532-poc-1e-add-llm-api-key-checks-in-llm-based-evaluators

Update evaluators_service.py

cc33a66

Merge pull request #1989 from Agenta-AI/feature/age-532-poc-1e-add-ll…

9212c7b

…m-api-key-checks-in-llm-based-evaluators [Enhancement] Add LLM API key checks to LLM-based evaluators

vercel bot deployed to Preview – agenta-documentation August 29, 2024 08:22 View deployment

vercel bot deployed to Preview – agenta August 29, 2024 08:25 View deployment

bekossy changed the base branch from main to AGE-587/-implement-evaluation-main-page August 30, 2024 15:39

bekossy merged commit f563103 into AGE-587/-implement-evaluation-main-page Aug 30, 2024
6 checks passed

bekossy deleted the feature/age-491-poc-1e-expose-running-evaluators-via-api-to-playground branch August 30, 2024 15:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: expose running evaluators via API to playground #1956

[Feature]: expose running evaluators via API to playground #1956

aybruhm commented Aug 1, 2024 •

edited

Loading

vercel bot commented Aug 1, 2024 •

edited

Loading

[Feature]: expose running evaluators via API to playground #1956

[Feature]: expose running evaluators via API to playground #1956

Conversation

aybruhm commented Aug 1, 2024 • edited Loading

Description

Evaluators that have been tested

What to QA

Related Issue

vercel bot commented Aug 1, 2024 • edited Loading

aybruhm commented Aug 1, 2024 •

edited

Loading

vercel bot commented Aug 1, 2024 •

edited

Loading