- Tempe, AZ, USA
-
22:06
(UTC -07:00) - https://scholar.google.com/citations?user=hdXXMPwAAAAJ&hl=en
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
This is the official repository of the EMNLP 2024 paper: Defending Against Social Engineering Attacks in the Age of LLMs.
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
Streamlit — A faster way to build and share data apps.
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
TMLS 2024 Workshop: A Practitioner's Guide To Safeguarding Your LLM Applications
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024
Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".
Sparse probing paper full code.
Improving Alignment and Robustness with Circuit Breakers
Every practical and proposed defense against prompt injection.
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Simplify and improve the job hunting experience by integrating LLMs to automate tasks such as resume and cover letter generation, as well as application submission, saving users time and effort.
[TACL] Code for "Red Teaming Language Model Detectors with Language Models"
[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text
Modeling, training, eval, and inference code for OLMo
Code accompanying "How I learned to start worrying about prompt formatting".
official code for "Large Language Models as Optimizers"
Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency
Code for our paper titled "PEACE: Cross-Platform Hate Speech Detection - A Causality-guided Framework"
Robust machine learning for responsible AI
Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improv…