Skip to content
View GregxmHu's full-sized avatar
🤝
🤝
Block or Report

Block or report GregxmHu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

List of papers on hallucination detection in LLMs.

489 36 Updated Jun 30, 2024

Set of tools to assess and improve LLM security.

Python 2,153 355 Updated Jul 11, 2024

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

579 35 Updated Jul 5, 2024

Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good …

Jupyter Notebook 27 Updated Mar 19, 2024

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Python 491 38 Updated Mar 10, 2024

TAP: An automated jailbreaking method for black-box LLMs

Python 95 15 Updated Mar 8, 2024

Code for visualizing the loss landscape of neural nets

Python 2,726 389 Updated Apr 5, 2022

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

12,787 1,345 Updated Feb 13, 2023
Python 49 9 Updated Nov 13, 2023

The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 166 28 Updated Jun 5, 2024

[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair

Python 109 25 Updated Apr 20, 2023

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,043 58 Updated Jan 4, 2024

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook 1,627 142 Updated Jun 10, 2024

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,501 114 Updated Sep 19, 2023

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull reque…

Python 863 141 Updated Jan 7, 2024

Curation of prompts that are known to be adversarial to large language models

168 9 Updated Feb 12, 2023

prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记

96 16 Updated Oct 31, 2023

A robustness evaluation framework for large language models on adversarial prompts

Python 1 Updated Jul 3, 2023

Can AI-Generated Text be Reliably Detected?

Python 48 2 Updated Nov 16, 2023

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML …

Python 276 28 Updated Feb 26, 2024

New ways of breaking app-integrated LLMs

Jupyter Notebook 1,744 111 Updated Jun 17, 2023

A curated list of trustworthy Generative AI papers. Daily updating...

65 4 Updated Sep 14, 2023

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

Python 10,801 1,158 Updated Jun 30, 2023

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,145 4,015 Updated Mar 12, 2024

Dataset of GPT-2 outputs for research in detection, biases, and more

Python 1,919 548 Updated Dec 13, 2023

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)

Python 533 64 Updated May 9, 2024

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 34,712 5,344 Updated Jul 10, 2024
Python 28 16 Updated Jun 12, 2023
Next