Model As A Judge Eval Patterns

In this repo, we will examine different techniques of using a model (LLM) as a judge when evaluating LLM based solutions.

Content in this repo

We will various types of evals. The datasets provided were all synthetically generated using LLMs and are located in the /data directory. The evals are in jupyter notebooks and are located in the /eval directory.

Types of Evals

00_basic_chat_evaluation: This eval consumes a chat conversation from a chatbot/agent and evaluates it based on a rubric. The purpose of this notebook is to demonstrate a basic eval pattern. In subsequent evaluations, we'll go through some more advanced evals for comparing chats from different models.

More evaluations coming soon.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
eval		eval
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model As A Judge Eval Patterns

Content in this repo

Types of Evals

Security

License

About

Releases

Packages

Languages

License

aws-samples/model-as-a-judge-eval

Folders and files

Latest commit

History

Repository files navigation

Model As A Judge Eval Patterns

Content in this repo

Types of Evals

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages