Skip to content

Commit

Permalink
Merge pull request EleutherAI#79 from EleutherAI/fm-cheatsheet
Browse files Browse the repository at this point in the history
Add FM Dev Cheatsheet blogpost
  • Loading branch information
haileyschoelkopf committed Feb 29, 2024
2 parents 323a5b9 + 9ac3fbe commit 91ad068
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions content-blog/fm-dev-cheatsheet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: "The Foundation Model Development Cheatsheet"
date: 2024-2-29
description: "Announcing a new resource, the FM Dev Cheatsheet."
author: ["EleutherAI"]
draft: false
---

The pace of foundation model releases and progress has continued to grow rapidly over the past few years, with many new models released from [organizations of all kinds worldwide](https://docs.google.com/spreadsheets/d/1gc6yse74XCwBx028HV_cvdxwXkmXejVjkO-Mz2uwE0k/edit?pli=1#gid=0). In addition to releasing models themselves, it's also important to make the tools to create these models - [large-scale training libraries](https://github.com/EleutherAI/gpt-neox), [data processing and creation tooling](https://github.com/allenai/dolma), and more - widely available. In April 2023 we released the Pythia model suite, the first LLMs with a fully released and reproducible technical pipeline from start to finish. We are excited to see other organizations following suit, with the [LLM360](https://www.llm360.ai/) project releasing Amber later that year and AI2’s [OLMo](https://allenai.org/olmo) as fully-transparent artifact releases across the entire language model development process. Additionally, many other orgs have released new tools for underserved aspects of the development pipeline. Without full-pipeline transparency, accountability for undisclosed design decisions is prevented, and independent research and auditing are limited in their ability to draw robust conclusions or accurately assess harms.

As a continuation of EleutherAI’s mission to lower [barriers to entry](https://arxiv.org/abs/2210.06413) of research and provide mentorship and [educational](https://blog.eleuther.ai/transformer-math/) [resources](https://github.com/EleutherAI/cookbook) about large-scale AI model development, we have collaborated with researchers from MIT, AI2, Hugging Face, Stanford, Princeton, Masakhane, MLCommons, and more to release “The Foundation Model Development Cheatsheet”, a quick-start guide to familiarize new developers with useful tools and resources for developing new open models. The topics covered span the entire model development cycle, from data collection to licensing and release practices, and are aimed to give a jumping-off point and high level survey of all the important steps for responsibly and successfully developing new models. We hope that the Cheatsheet will be a useful learning resource and reference for newer developers to be exposed to not just the technical aspects of model creation, which rightfully receives much attention already, but also the crucially important good practices around responsible development practices and release management.

We hope the Cheatsheet will be a useful entry point into responsible and well-documented model development, and help raise awareness of these crucial issues. You can read the [paper](https://github.com/allenai/fm-cheatsheet/blob/main/app/resources/paper.pdf) for full details, or explore the collection of resources interactively via the [interactive website](https://fmcheatsheet.org/). It is intended as a living resource–all are welcome to [submit new resources](https://github.com/allenai/fm-cheatsheet#add-to-cheatsheet) and be recognized for their contributions!

0 comments on commit 91ad068

Please sign in to comment.