Skip to content

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

License

Notifications You must be signed in to change notification settings

IpsumDominum/gpt-neox-with-api

 
 

Repository files navigation

GPT-NeoX

Follow following steps to set up GPT NeoX:

Link to setup instructions

Set up inference server

There's a simple flask server set up in run_server.py. Requirements are :

Flask-Cors
Flask-RESTful

python run_server.py runs the server at 127.0.0.1:5000 by default.

Post endpoint setup at 127.0.0.1:5000/text_completion.

Example Post Request:

POST /text_completion HTTP/1.1
Host: 127.0.0.1:5000
Content-Type: application/json
cache-control: no-cache
{
	"prompt":"To use the facilities at your own risk, and take responsibility for any personal injury."
}

About

An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 84.4%
  • C++ 12.8%
  • Cuda 1.2%
  • C 0.9%
  • Dockerfile 0.6%
  • Shell 0.1%