Training and exploration of linear probes into Othello-GPT by Li et al. (2022)
-
Updated
Jun 29, 2023 - Jupyter Notebook
Training and exploration of linear probes into Othello-GPT by Li et al. (2022)
Optimizing Mind static website v1
COMP 551: Applied Machine Learning — Project #4
Goal: create and implement metrics to measure Transparency and Trustworthiness
A unified approach to explain the output of any machine learning model.
Exploring feature contributions to outliers, feature importances, and image recognition features
Running interpretability experiments with application to weak-to-strong generalization
Creating a PyTorch LSTM and Transformer to classify movies by genre and visualizing the LSTM's reasoning process
Neural model interpretation on MRI data
Creating the model and approach to manage and adjust the process/equipment
where I learn and explore mechanistic interpretability of transformers
A collection of infrastructure and tools for research in neural network interpretability.
Interpretations on the HPA dataset.
Summarize "Interpretable Machine Learning" book.
Investigation of state space model interpretability using SHAP (SHapley Additive exPlanations), co-authors Yin Li and Lancaster Wu
Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings
Techniques for interpreting ConvNets
Code for the paper: PatchX: Explaining Deep Models by Intelligible Pattern Patches for Time-series Classification
Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."