notrichardren

Follow

Richard Ren notrichardren

Follow

8 followers · 20 following

Zürich, CH
https://huggingface.co/notrichardren

Achievements

Achievements

Block or Report

Block or report notrichardren

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

notrichardren/README.md

👋 Hi, I’m Richard Ren. I work on LLM evaluations, representation engineering, and interpretability. I'm a co-author for the "Representation Engineering" and "Localizing Lying in Llama" papers.

📫 Email | 🎓 Google Scholar

Pinned Loading

magikarp01/iti_capstone magikarp01/iti_capstone Public

Analyzing truth representations in LLMs across different kinds of truth and intervening on their hidden states to make LLMs more truthful

Jupyter Notebook 5 1
jam3scampbell/llama-lying jam3scampbell/llama-lying Public

Code for our paper "Localizing Lying in Llama"

Jupyter Notebook 10 2
arena-curriculum arena-curriculum Public

Forked from callummcdougall/ARENA_2.0

Exercises on mechanistic interpretability, RL, and training models at scale

Jupyter Notebook
representation-engineering representation-engineering Public

Forked from andyzoujm/representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook