Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

OpenAssistant inference service

The inference service is a component that answers prompts by calling the OpenAssistant (OA) models.

Systems architecture

It has a HTTP server and several workers.

The server is a Python application that communicates via gRPC with the workers, which are the ones that use the model to carry out the inference.

The frontend (web/node.js) makes API calls to this inference service (backend).

Refer to OA developer docs to learn more.

What's inside

This project provides a starter kit to run a variety of OA models.

Pre-release (alpha-ish) models and family:

  • Chip2: version 6
  • Joi: version 5
  • Chip: version 4
  • Rosey: version 3

Dependencies

Install Python packages:

  • PyTorch
  • bitsandbytes - 8-bit optimizers
  • Sanic - Python web server and web framework

Deployment

This starter kit currently deploy the inference service container on AWS Fargate (serverless compute engine).

First, build the container image using the Dockerfile.

Push the image to AWS container registry.