-
Department of Computer Science, HKBU
- Hong Kong
-
15:20
(UTC +08:00) - https://zfancy.github.io/
Stars
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"
This repo contains papers, books, tutorials and resources on Riemannian optimization.
Using Explanations as a Tool for Advanced LLMs
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…
Code for the paper 🌳 Tree Search for Language Model Agents
agent q - oss advanced reasoning and learning for autonomous ai agents
VisualWebArena is a benchmark for multimodal agents.
[ICML2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
The official implementation of Self-Play Fine-Tuning (SPIN)
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
A programming framework for agentic AI 🤖
[CVPR 2024] Official Repository for "Efficient Test-Time Adaptation of Vision-Language Models"
Repo for the research paper "Aligning LLMs to Be Robust Against Prompt Injection"
Repository for research works and resources related to model reprogramming <https://arxiv.org/abs/2202.10629>
[NeurIPS 2024] "What If the Input is Expanded in OOD Detection?"
PAIR.withgoogle.com and friend's work on interpretability methods
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
"Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?"