Skip to content
View AmritaBh's full-sized avatar

Highlights

  • Pro

Organizations

@DMML-ASU

Block or report AmritaBh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the official repository of the EMNLP 2024 paper: Defending Against Social Engineering Attacks in the Age of LLMs.

Python 5 1 Updated Oct 8, 2024

Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs

Jupyter Notebook 183 24 Updated Jun 7, 2024

Streamlit — A faster way to build and share data apps.

Python 35,701 3,094 Updated Nov 15, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

5,163 286 Updated Nov 11, 2024

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Python 21 1 Updated Jul 9, 2024

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Python 428 38 Updated Jul 15, 2024

TMLS 2024 Workshop: A Practitioner's Guide To Safeguarding Your LLM Applications

Jupyter Notebook 4 1 Updated Jul 11, 2024

Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models, ICML 2024

Python 15 3 Updated Jul 7, 2024

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 120 25 Updated Oct 1, 2024

Sparse probing paper full code.

Jupyter Notebook 50 10 Updated Dec 17, 2023

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 152 17 Updated Sep 24, 2024

Every practical and proposed defense against prompt injection.

343 25 Updated May 31, 2024

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP https://textattack.readthedocs.io/en/master/

Python 2,969 398 Updated Jul 25, 2024

OpenAGI: When LLM Meets Domain Experts

Python 1,962 166 Updated Sep 2, 2024

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

Python 83 4 Updated May 23, 2024

Simplify and improve the job hunting experience by integrating LLMs to automate tasks such as resume and cover letter generation, as well as application submission, saving users time and effort.

Python 107 54 Updated Oct 6, 2024

[TACL] Code for "Red Teaming Language Model Detectors with Language Models"

Python 16 3 Updated Nov 24, 2023

[ICML 2024] Binoculars: Zero-Shot Detection of LLM-Generated Text

Python 211 30 Updated May 14, 2024

Modeling, training, eval, and inference code for OLMo

Python 4,630 473 Updated Nov 15, 2024

Code accompanying "How I learned to start worrying about prompt formatting".

Python 93 9 Updated Oct 2, 2024

official code for "Large Language Models as Optimizers"

Python 441 46 Updated Aug 16, 2024
Python 57 7 Updated Sep 1, 2024

the LLM vulnerability scanner

Python 1,434 172 Updated Nov 14, 2024

Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency

Jupyter Notebook 33 7 Updated May 28, 2024

Code for our paper titled "PEACE: Cross-Platform Hate Speech Detection - A Causality-guided Framework"

Python 4 Updated Jun 12, 2023

Robust machine learning for responsible AI

Python 458 57 Updated Jul 12, 2024

Use Large Language Models like OpenAI's GPT-3.5 for data annotation and model enhancement. This framework combines human expertise with LLMs, employs Iterative Active Learning for continuous improv…

Python 29 4 Updated Sep 11, 2023
Next