Skip to content
@apartresearch

apartresearch

Artificial intelligence will change the world. Our mission is to ensure this happens safely and to the benefit of everyone.

Apart facilitates new research in AI safety, towards reducing societal-scale risks from the technology.

We combine a community focus with a drive for high-quality security research.


Read more about our work:

  • Our Research — Foundational research for safe and beneficial advanced AI
  • Apart Lab — Our research fellowship program for aspiring researchers in AI safety
  • Apart Sprints — Weekend-long research sprints and hackathons for AI security and governance

Twitter Badge LinkedIn Badge YouTube Badge Discord Badge Alignment Jam RSS Badge

Pinned Loading

  1. interpretability-starter interpretability-starter Public

    🧠 Starter templates for doing interpretability research

    59 1

  2. Neuron2Graph Neuron2Graph Public

    Tools for exploring Transformer neuron behaviour, including input pruning and diversification.

    Jupyter Notebook 17 5

  3. deepdecipher deepdecipher Public

    🦠 DeepDecipher: An open source API to MLP neurons

    Rust 9

  4. specificityplus specificityplus Public

    👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"

    Python 20 3

  5. Integer_Addition Integer_Addition Public

    ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks

    Jupyter Notebook 12 1

  6. readingwhatwecan readingwhatwecan Public

    📚📚📚📚📚📚📚📚📚 Reading everything

    CSS 12 3

Repositories

Showing 10 of 35 repositories
  • ICML2024MI Public

    🌍 Website for NeurIPS2023MI

    apartresearch/ICML2024MI’s past year of commit activity
    CSS 1 2 0 0 Updated Aug 19, 2024
  • Integer_Addition Public

    ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks

    apartresearch/Integer_Addition’s past year of commit activity
    Jupyter Notebook 12 MIT 1 0 0 Updated Aug 16, 2024
  • Interpreting-Reward-Models Public

    ✱ Interpreting implicit reward models learnt in RLHF using sparse autoencoders.

    apartresearch/Interpreting-Reward-Models’s past year of commit activity
    Jupyter Notebook 0 MIT 0 7 0 Updated Aug 7, 2024
  • apartresearch/Research-Augmentation-Hackbook’s past year of commit activity
    Python 5 0 0 0 Updated Jul 19, 2024
  • seqcont_circuits Public

    ✱ Interpreting how similar sequence continuation tasks share internal representations ✱

    apartresearch/seqcont_circuits’s past year of commit activity
    Jupyter Notebook 1 MIT 0 0 0 Updated Jul 1, 2024
  • hackathon-utils Public

    😎 Code to run hackathons efficiently

    apartresearch/hackathon-utils’s past year of commit activity
    0 MIT 0 0 0 Updated May 29, 2024
  • evaluations-starter Public

    How to get started in evaluations and demonstrations research for dangerous capabilities

    apartresearch/evaluations-starter’s past year of commit activity
    5 MIT 1 1 0 Updated May 24, 2024
  • deepdecipher Public

    🦠 DeepDecipher: An open source API to MLP neurons

    apartresearch/deepdecipher’s past year of commit activity
    Rust 9 MIT 0 46 0 Updated May 2, 2024
  • readingwhatwecan Public

    📚📚📚📚📚📚📚📚📚 Reading everything

    apartresearch/readingwhatwecan’s past year of commit activity
    CSS 12 3 0 0 Updated Apr 21, 2024
  • scale-llm-24 Public Forked from apartresearch/ICML2024MI

    🌍 Website for the Scaling Laws workshop

    apartresearch/scale-llm-24’s past year of commit activity
    CSS 1 2 0 0 Updated Mar 22, 2024