ZFancy

Follow

💥

Focusing

Jianing Zhu ZFancy

💥

Focusing

Follow

CS Ph.D. Student @ HKBU & TMLR Group

45 followers · 99 following

Department of Computer Science, HKBU
Hong Kong
00:37 (UTC +08:00)
https://zfancy.github.io/

Achievements

Achievements

Stars

PAIR-code / interpretability

PAIR.withgoogle.com and friend's work on interpretability methods

JavaScript 140 29 Updated Sep 11, 2024

llm-misinformation / llm-misinformation

The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"

Shell 47 5 Updated Aug 9, 2024

hy-zhao23 / Explainability-for-Large-Language-Models

102 12 Updated Jan 15, 2024

Luckfort / CD

"Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?"

Python 56 4 Updated Sep 17, 2024

abhinav-neil / probing-lms

Probing language models for linguistic features in their representations

Jupyter Notebook 3 Updated May 12, 2023

leondz / garak

LLM vulnerability scanner

Python 1,325 153 Updated Oct 7, 2024

chihkuanyeh / concept_exp

code release for the paper "On Completeness-aware Concept-Based Explanations in Deep Neural Networks"

Python 52 15 Updated Mar 25, 2022

pytorch / captum

Model interpretability and understanding for PyTorch

Python 4,854 489 Updated Sep 26, 2024

Dakingrai / awesome-mechanistic-interpretability-lm-papers

62 5 Updated Jul 19, 2024

JShollaj / awesome-llm-interpretability

A curated list of Large Language Model (LLM) Interpretability resources.

1,097 88 Updated Jul 31, 2024

deep-floyd / IF

Python 7,658 496 Updated Apr 14, 2024

jybai / concept-gradients

Official repository for ``Concept-based Interpretation Without Linear Assumption'' published in ICLR 2023.

Python 2 Updated Feb 5, 2024

sail-sg / MMCBench

Python 27 Updated Jan 23, 2024

JasonForJoy / Model-Editing-Hurt

EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

Python 30 3 Updated Sep 24, 2024

meghdadk / SCRUB

Jupyter Notebook 36 4 Updated Aug 17, 2024

tmlr-group / WCA

Forked from JinhaoLee/WCA

[ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"

Python 42 Updated Sep 3, 2024

llm-merging / LLM-Merging

LLM-Merging: Building LLMs Efficiently through Merging

Jupyter Notebook 170 36 Updated Sep 24, 2024

yossigandelsman / second_order_lens

Official pytorch implementation of "Interpreting the Second-Order Effects of Neurons in CLIP"

Jupyter Notebook 27 3 Updated Aug 1, 2024

yinyueqin / relative-preference-optimization

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

Python 16 1 Updated Feb 23, 2024

jbloomAus / SAELens

Training Sparse Autoencoders on Language Models

Jupyter Notebook 397 108 Updated Oct 7, 2024

GraySwanAI / circuit-breakers

Improving Alignment and Robustness with Circuit Breakers

Jupyter Notebook 133 16 Updated Sep 24, 2024

artemyk / ibsgd

Jupyter Notebook 144 48 Updated Apr 20, 2020

AntixK / PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Python 6,522 1,058 Updated Jun 13, 2024

Jackson-Kang / Pytorch-VAE-tutorial

A simple tutorial of Variational AutoEncoders with Pytorch

Jupyter Notebook 320 74 Updated Feb 15, 2024

connorlee77 / pytorch-mutual-information

Mutual Information in Pytorch

Python 107 10 Updated Aug 23, 2023

SashaMalysheva / Pytorch-VAE

This is an implementation of the VAE (Variational Autoencoder) for Cifar10

Python 63 21 Updated Dec 20, 2021

likenneth / honest_llama

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Python 448 36 Updated Sep 29, 2024

DLVulDet / PrimeVul

Repository for PrimeVul Vulnerability Detection Dataset

Python 65 5 Updated Sep 7, 2024

wagner-group / diversevul

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection (RAID 2023) https://surrealyz.github.io/files/pubs/raid23-diversevul.pdf

94 4 Updated Jul 1, 2024

yashgupta-7 / rai-games

RAI Games

Jupyter Notebook 4 Updated Oct 21, 2023