Skip to content

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

License

Notifications You must be signed in to change notification settings

jamestiotio/gpt-neox

Repository files navigation

GPT-NeoX

This repository records EleutherAI's work-in-progress for training large scale GPU language models. Our current frameowkr is based on NVIDIA's Megatron model and has been augmented with techniques from DeepSpeed as well as some novel optimizations. If you are looking for our TPU codebase, see GPT-Neo.

Getting Started

TO DO

Training

TO DO

Datasets

TO DO

Pretrained Models

TO DO

Downloading Checkpoints

TO DO

Inference

TO DO

Fine-Tuning

TO DO

Licensing

This repository hosts code that is part of EleutherAI's GPT-NeoX project. Copyright 2021 Stella Biderman, Sid Black, Josh Levy-Kramer, and Shivanshu Purohit.

GPT-NeoX is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.

This repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NIVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are original to EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.

About

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 85.2%
  • C++ 11.9%
  • Cuda 1.1%
  • C 0.8%
  • Dockerfile 0.7%
  • Shell 0.3%