-
Stanford University
Highlights
- Pro
Block or Report
Block or report explanare
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
ravel Public
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
-
verbatim-memorization Public
Demystifying Verbatim Memorization in Large Language Models
UpdatedJul 29, 2024 -
eval-neuron-explanation Public
A framework for evaluating natural language explanations of neurons.
-
char-iit Public
A causal intervention framework to learn robust and interpretable character representations inside subword-based language models