- Germany
-
07:39
(UTC +02:00)
Highlights
- Pro
Block or Report
Block or report khipp
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
The official implementation of the ICML 2024 paper "MemoryLLM: Towards Self-Updatable Large Language Models"
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
Official implementation of Goldfish Loss: Mitigating Memorization in Generative LLMs
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
LiveBench: A Challenging, Contamination-Free LLM Benchmark
"On the Privacy Risks of Algorithmic Recourse". Martin Pawelczyk, Himabindu Lakkaraju* and Seth Neel*. In International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR, 2023.
[ICML 2024] Selecting High-Quality Data for Training Language Models
A framework for few-shot evaluation of language models.
An Open Robustness Benchmark for Jailbreaking Language Models [arXiv 2024]
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
Source code for paper "Sampling-based Pseudo-Likelihood for Membership Inference Attacks".
Enhancing small language models with LLM generated counterfactuals.
Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936
PAL: Proxy-Guided Black-Box Attack on Large Language Models
A Comprehensive Assessment of Trustworthiness in GPT Models
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20 via OpenAI’s APIs.
Weak-to-Strong Jailbreaking on Large Language Models
Python package for measuring memorization in LLMs.
A comprehensive toolbox for model inversion attacks and defenses, which is easy to get started.
Implementation of "Membership Inference Attacks against Language Models via Neighbourhood Comparison" by Justus Mattern, Fatemehsadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan…
Official Code for ACL 2023 paper: "Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation"