In this repo, we will examine different techniques of using a model (LLM) as a judge when evaluating LLM based solutions.
We will various types of evals. The datasets provided were all synthetically generated using LLMs and are located in the /data directory. The evals are in jupyter notebooks and are located in the /eval directory.
- 00_basic_chat_evaluation: This eval consumes a chat conversation from a chatbot/agent and evaluates it based on a rubric. The purpose of this notebook is to demonstrate a basic eval pattern. In subsequent evaluations, we'll go through some more advanced evals for comparing chats from different models.
More evaluations coming soon.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.