Skip to content
View ZFancy's full-sized avatar
💥
Focusing
💥
Focusing

Block or report ZFancy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

PAIR.withgoogle.com and friend's work on interpretability methods

JavaScript 140 29 Updated Sep 11, 2024

The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"

Shell 47 5 Updated Aug 9, 2024

"Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?"

Python 56 4 Updated Sep 17, 2024

Probing language models for linguistic features in their representations

Jupyter Notebook 3 Updated May 12, 2023

LLM vulnerability scanner

Python 1,325 153 Updated Oct 7, 2024

code release for the paper "On Completeness-aware Concept-Based Explanations in Deep Neural Networks"

Python 52 15 Updated Mar 25, 2022

Model interpretability and understanding for PyTorch

Python 4,854 489 Updated Sep 26, 2024

A curated list of Large Language Model (LLM) Interpretability resources.

1,097 88 Updated Jul 31, 2024
Python 7,658 496 Updated Apr 14, 2024

Official repository for ``Concept-based Interpretation Without Linear Assumption'' published in ICLR 2023.

Python 2 Updated Feb 5, 2024
Python 27 Updated Jan 23, 2024

EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

Python 30 3 Updated Sep 24, 2024
Jupyter Notebook 36 4 Updated Aug 17, 2024

[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"

Python 42 Updated Sep 3, 2024

LLM-Merging: Building LLMs Efficiently through Merging

Jupyter Notebook 170 36 Updated Sep 24, 2024

Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"

Jupyter Notebook 27 3 Updated Aug 1, 2024

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

Python 16 1 Updated Feb 23, 2024

Training Sparse Autoencoders on Language Models

Jupyter Notebook 397 108 Updated Oct 7, 2024

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 133 16 Updated Sep 24, 2024
Jupyter Notebook 144 48 Updated Apr 20, 2020

A Collection of Variational Autoencoders (VAE) in PyTorch.

Python 6,522 1,058 Updated Jun 13, 2024

A simple tutorial of Variational AutoEncoders with Pytorch

Jupyter Notebook 320 74 Updated Feb 15, 2024

Mutual Information in Pytorch

Python 107 10 Updated Aug 23, 2023

This is an implementation of the VAE (Variational Autoencoder) for Cifar10

Python 63 21 Updated Dec 20, 2021

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 448 36 Updated Sep 29, 2024

Repository for PrimeVul Vulnerability Detection Dataset

Python 65 5 Updated Sep 7, 2024

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection (RAID 2023) https://surrealyz.github.io/files/pubs/raid23-diversevul.pdf

94 4 Updated Jul 1, 2024

RAI Games

Jupyter Notebook 4 Updated Oct 21, 2023
Next