llm-eval

Here are 12 public repositories matching this topic...

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Updated Jun 30, 2024
Python

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

machine-learning monitoring evaluation experimentation jailbreak-detection autoevaluation root-cause-analysis prompt-engineering llmops openai-evals llm-prompting llm-eval llm-test hallucination-detection

Updated Jun 8, 2024
Python

athina-ai / athina-evals

Star

Python SDK for running evaluations on LLM generated responses

evaluation evaluation-metrics evaluation-framework llmops llm-eval llm-ops llm-evaluation llm-evaluation-toolkit

Updated Jun 25, 2024
Python

fiddlecube / fiddlecube-sdk

Star

Generate ideal question-answers for testing RAG

synthetic-data llm-training llm-eval fine-tune-llms

Updated Jun 29, 2024
Python

Re-Align / just-eval

Star

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

evaluation gpt4 llm llm-eval llm-evaluation llm-evaluation-toolkit

Updated Jan 29, 2024
Python

parea-ai / parea-sdk-py

Star

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

metrics good-first-issue llm prompt-engineering generative-ai llmops llm-eval llm-tools llm-evaluation llm-evaluation-toolkit llms-benchmarking llm-evaluation-framework

Updated Jun 26, 2024
Python

Auto-Playground / ragrank

Star

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

machine-learning evaluation language-model rag llm prompt-engineering llmops llm-eval

Updated May 26, 2024
Python

Networks-Learning / prediction-powered-ranking

Star

Code for "Prediction-Powered Ranking of Large Language Models", Arxiv 2024.

ranking-algorithm llm-eval llm-evaluation llm-evaluation-framework prediction-powered-inference rank-sets

Updated May 27, 2024
Python

harshagrawal523 / GenerativeAgents

Star

Generative agents — computational software agents that simulate believable human behavior and OpenAI LLM models. Our main focus was to develop a game - “Werewolves of Miller’s Hollow”, aiming to replicate human-like behavior.

docker transformers openai mongodb-atlas pygame-gui llm generative-ai llm-eval