Stars
Inspect: A framework for large language model evaluations
Deep learning for dummies. All the practical details and useful utilities that go into working with real models.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
🦜🔗 Build context-aware reasoning applications
Locating and editing factual associations in GPT (NeurIPS 2022)
Documentation and source code powering Twitter's Community Notes
Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.
Reinforcement learning algorithms, produced mostly or entirely from scratch.
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
🔥Highlighting the top ML papers every week.
A library for mechanistic interpretability of GPT-style language models
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
Implementation of VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning - Zintgraf et al. (ICLR 2020)