Universal and Transferable Attacks on Aligned Language Models
-
Updated
Sep 19, 2023 - Python
Universal and Transferable Attacks on Aligned Language Models
LLM Platform Security: Applying a Systematic Evaluation Framework to OpenAI's ChatGPT Plugins
This repo focus on how to deal with prompt injection problem faced by LLMs
This project investigates the security of large language models by performing binary classification of a set of input prompts to discover malicious prompts. Several approaches have been analyzed using classical ML algorithms, a trained LLM model, and a fine-tuned LLM model.
Vulnerable LLM Application
The Security Toolkit for LLM Interactions (TS version)
MER is a software that identifies and highlights manipulative communication in text from human conversations and AI-generated responses. MER benchmarks language models for manipulative expressions, fostering development of transparency and safety in AI. It also supports manipulation victims by detecting manipulative patterns in human communication.
⚡ Vigil ⚡ Detect prompt injections, jailbreaks, and other potentially risky Large Language Model (LLM) inputs
LLM Security Project with Llama Guard
Evaluation of Google's Instruction Tuned Gemma-2B, an open-source Large Language Model (LLM). Aimed at understanding the breadth of the model's knowledge, its reasoning capabilities, and adherence to ethical guardrails, this project presents a systematic assessment across a diverse array of domains.
CLI tool that uses the Lakera API to perform security checks in LLM inputs
MINOTAUR: The STRONGEST Secure Prompt EVER! Prompt Security Challenge, Impossible GPT Security, Prompts Cybersecurity, Prompting Vulnerabilities, FlowGPT, Secure Prompting, Secure LLMs, Prompt Hacker, Cutting-edge Ai Security, Unbreakable GPT Agent, Anti GPT Leak, System Prompt Security.
It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) in 2024.
Example of running last_layer with FastAPI on vercel
A benchmark for evaluating the robustness of LLMs and defenses to indirect prompt injection attacks.
LLM security and privacy
Guard your LangChain applications against prompt injection with Lakera ChainGuard.
Whispers in the Machine: Confidentiality in LLM-integrated Systems
An easy-to-use Python framework to generate adversarial jailbreak prompts.
Add a description, image, and links to the llm-security topic page so that developers can more easily learn about it.
To associate your repository with the llm-security topic, visit your repo's landing page and select "manage topics."