-
Department of Computer Science, HKBU
- Hong Kong
-
00:37
(UTC +08:00) - https://zfancy.github.io/
Stars
PAIR.withgoogle.com and friend's work on interpretability methods
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
"Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?"
Probing language models for linguistic features in their representations
code release for the paper "On Completeness-aware Concept-Based Explanations in Deep Neural Networks"
Model interpretability and understanding for PyTorch
A curated list of Large Language Model (LLM) Interpretability resources.
Official repository for ``Concept-based Interpretation Without Linear Assumption'' published in ICLR 2023.
EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
tmlr-group / WCA
Forked from JinhaoLee/WCA[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"
LLM-Merging: Building LLMs Efficiently through Merging
Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"
Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts
Training Sparse Autoencoders on Language Models
Improving Alignment and Robustness with Circuit Breakers
A Collection of Variational Autoencoders (VAE) in PyTorch.
A simple tutorial of Variational AutoEncoders with Pytorch
Mutual Information in Pytorch
This is an implementation of the VAE (Variational Autoencoder) for Cifar10
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Repository for PrimeVul Vulnerability Detection Dataset
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection (RAID 2023) https://surrealyz.github.io/files/pubs/raid23-diversevul.pdf