Skip to content
View FYYFU's full-sized avatar
Block or Report

Block or report FYYFU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.

Jupyter Notebook 127 14 Updated Aug 9, 2024

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 1,394 146 Updated Aug 17, 2024

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 3,556 370 Updated Aug 17, 2024

This repo includes ChatGPT prompt curation to use ChatGPT better.

HTML 108,153 14,820 Updated Aug 16, 2024

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Python 392 33 Updated Jul 31, 2024

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1,351 86 Updated Jun 3, 2024

An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]

Python 148 15 Updated Aug 15, 2024

Robust recipes to align language models with human and AI preferences

Python 4,360 374 Updated Aug 15, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,347 154 Updated Nov 28, 2023

A library for mechanistic interpretability of GPT-style language models

Python 1,332 264 Updated Aug 16, 2024

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Python 13,779 2,083 Updated Aug 17, 2024

Papers and resources related to the security and privacy of LLMs πŸ€–

Python 363 29 Updated Aug 7, 2024

Model interpretability and understanding for PyTorch

Python 4,757 481 Updated Aug 17, 2024

Interpretability for sequence generation models πŸ› πŸ”

Python 348 37 Updated Aug 15, 2024

The jailbreak-evaluation is an easy-to-use Python package for language model jailbreak evaluation.

Python 19 2 Updated May 15, 2024

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

661 40 Updated Aug 17, 2024

[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

62 5 Updated Aug 7, 2024

A curation of awesome tools, documents and projects about LLM Security.

838 83 Updated Aug 17, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,258 2,032 Updated Aug 9, 2024

Code and data for "The Power of Noise: Redefining Retrieval for RAG Systems"

Jupyter Notebook 36 1 Updated Aug 13, 2024
Jupyter Notebook 9 2 Updated Apr 27, 2024

πŸ” LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your d…

Python 15,161 1,750 Updated Aug 17, 2024

Efficient Retrieval Augmentation and Generation Framework

Python 1,219 104 Updated Aug 8, 2024

A framework for few-shot evaluation of language models.

Python 6,157 1,630 Updated Aug 17, 2024

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.

Python 26,758 1,439 Updated Aug 9, 2024

Typed argument parser for Python

Python 489 39 Updated Aug 12, 2024

ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.

Python 57 6 Updated May 9, 2024

Capture Screen, Audio, Cursor, Mouse Clicks and Keystrokes

C# 9,588 1,793 Updated Apr 9, 2023

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

668 41 Updated May 8, 2024
Next