README.md

OpenAssistant inference service

The inference service is a component that answers prompts by calling the OpenAssistant (OA) models.

It has a HTTP server and several workers.

The server is a Python application that communicates via gRPC with the workers, which are the ones that use the model to carry out the inference.

The frontend (web/node.js) makes API calls to this inference service (backend).

Refer to OA developer docs to learn more.

This project provides a starter kit to run a variety of OA models.

Pre-release (alpha-ish) models and family:

Install Python packages:

This starter kit currently deploy the inference service container on AWS Fargate (serverless compute engine).

First, build the container image using the Dockerfile.

Push the image to AWS container registry.