GregxmHu

🤝

Xiaomeng Hu GregxmHu

🤝

PhD student in Computer Science.

16 followers · 16 following

The Chinese University of Hong Kong
Hong Kong SAR
https://gregxmhu.github.io/

Achievements

Block or Report

Block or report GregxmHu

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

EdinburghNLP / awesome-hallucination-detection

List of papers on hallucination detection in LLMs.

489 36 Updated Jun 30, 2024

meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.

Python 2,153 355 Updated Jul 11, 2024

openai / moderation-api-release

105 22 Updated Aug 9, 2022

ThuCCSLab / Awesome-LM-SSP

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

579 35 Updated Jul 5, 2024

IBM / RADAR

Code for our NeurIPS2023 accepted paper: RADAR: Robust AI-Text Detection via Adversarial Learning. We tested RADAR on 8 LLMs including Vicuna and LLaMA. The results show that RADAR can attain good …

Jupyter Notebook 27 Updated Mar 19, 2024

declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Python 491 38 Updated Mar 10, 2024

RICommunity / TAP

TAP: An automated jailbreaking method for black-box LLMs

Python 95 15 Updated Mar 8, 2024

tomgoldstein / loss-landscape

Code for visualizing the loss landscape of neural nets

Python 2,726 389 Updated Apr 5, 2022

dair-ai / ml-visuals

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

12,787 1,345 Updated Feb 13, 2023

arobey1 / smooth-llm

Python 49 9 Updated Nov 13, 2023

SheltonLiu-N / AutoDAN

The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 166 28 Updated Jun 5, 2024

michiyasunaga / BIFI

[ICML 2021] Break-It-Fix-It: Unsupervised Learning for Program Repair

Python 109 25 Updated Apr 20, 2023

yaodongC / awesome-instruction-dataset

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,043 58 Updated Jan 4, 2024

verazuo / jailbreak_llms

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).

Jupyter Notebook 1,627 142 Updated Jun 10, 2024

anthropics / hh-rlhf

Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"

1,501 114 Updated Sep 19, 2023

PrithivirajDamodaran / Parrot_Paraphraser

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull reque…

Python 863 141 Updated Jan 7, 2024

hwchase17 / adversarial-prompts

Curation of prompts that are known to be adversarial to large language models

168 9 Updated Feb 12, 2023

yunwei37 / prompt-hacker-collections

prompt attack-defense, prompt Injection, reverse engineering notes and examples | 提示词对抗、破解例子与笔记

96 16 Updated Oct 31, 2023

GregxmHu / promptbench

Forked from microsoft/promptbench

A robustness evaluation framework for large language models on adversarial prompts

Python 1 Updated Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaomeng Hu GregxmHu

Achievements

Achievements

Block or report GregxmHu

Starred repositories

EdinburghNLP / awesome-hallucination-detection

meta-llama / PurpleLlama

openai / moderation-api-release

ThuCCSLab / Awesome-LM-SSP

IBM / RADAR

declare-lab / instruct-eval

RICommunity / TAP

tomgoldstein / loss-landscape

dair-ai / ml-visuals

arobey1 / smooth-llm

SheltonLiu-N / AutoDAN

michiyasunaga / BIFI

yaodongC / awesome-instruction-dataset

verazuo / jailbreak_llms

anthropics / hh-rlhf

PrithivirajDamodaran / Parrot_Paraphraser

hwchase17 / adversarial-prompts

yunwei37 / prompt-hacker-collections

GregxmHu / promptbench

vinusankars / Reliability-of-AI-text-detectors

agencyenterprise / PromptInject

greshake / llm-security

byerose / Awesome-Foundation-Model-Security

databrickslabs / dolly

tatsu-lab / stanford_alpaca

openai / gpt-2-output-dataset

voidful / TextRL

karpathy / nanoGPT

qywu / TextGAIL

microsoft / Efficient-Large-LM-Trainer

Starred topics

prompt-injection