-
The Chinese University of Hong Kong
- Hong Kong SAR
- https://gregxmhu.github.io/
Block or Report
Block or report GregxmHu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
List of papers on hallucination detection in LLMs.
Set of tools to assess and improve LLM security.
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good …
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
TAP: An automated jailbreaking method for black-box LLMs
Code for visualizing the loss landscape of neural nets
🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.
The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".
[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull reque…
Curation of prompts that are known to be adversarial to large language models
prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记
GregxmHu / promptbench
Forked from microsoft/promptbenchA robustness evaluation framework for large language models on adversarial prompts
Can AI-Generated Text be Reliably Detected?
PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …
New ways of breaking app-integrated LLMs
A curated list of trustworthy Generative AI papers. Daily updating...
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
Code and documentation to train Stanford's Alpaca models, and generate the data.
Dataset of GPT-2 outputs for research in detection, biases, and more
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
The simplest, fastest repository for training/finetuning medium-sized GPTs.