Skip to content
/ nitro Public
forked from janhq/cortex

Accelerated AI Inference Server written in C++. Runs on consumer-grade hardware to datacenter-grade GPUs. OpenAI compatible API. StableDiffusion compatible API.

License

Notifications You must be signed in to change notification settings

luiyen/nitro

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nitro - Accelerated AI Inference Engine

nitrologo

GitHub commit activity Github Last Commit Github Contributors GitHub closed issues Discord

Getting Started - Docs - Changelog - Bug reports - Discord

⚠️ Nitro is currently in Development: Expect breaking changes and bugs!

Features

Supported features

  • Simple http webserver to do inference on triton (without triton client)
  • Upload inference result to s3 (txt2img)

TODO:

  • Local file server
  • Cache
  • GGML inference support (llama.cpp, etc...)
  • Plugins support

Nitro Endpoints

- /inferences/llm_models OPENAI_COMPATIBLE (STREAMING)
- /inferences/txt2img POST - JSON
- /inferences/img2img POST - MULTIPART

Documentation

Installation

Using Docker (Recommended)

  1. Prerequisites: Ensure you have a base Docker image with Triton Client installed.

    • Currently, only compatible with nvcr.io/nvidia/tritonserver:23.06-py3-sdk.
  2. Build Docker Image:

    docker build . -t jan_infer
  3. Configuration:

    • Download and modify the example config file from here.
    • Make sure to rename it by removing "example." from the filename.
    custom_config:
      s3_public_endpoint:  <your s3 endpoint>
      triton_endpoint: <your triton ip:port>
      s3_bucket: <your s3 bucket name>
      drogon_port: <backend deployment port>
  4. Run Docker Container:

    • Replace the placeholders with your specific configurations.
    docker run \
      -v /path/to/your/config.yaml:/workspace/workdir/janinfer_backend/config.yaml \
      -p 3000:3000 \
      -e AWS_ACCESS_KEY_ID=<your_access_key> \
      -e AWS_SECRET_ACCESS_KEY=<your_secret_key> \
      -e AWS_DEFAULT_REGION=<your_region> \
      jan_infer

Note: /path/to/your/config.yaml -> This is the config file that you need to make in step 3, you can place it anywhere as long as mount it properly like above.

That's it! You should now have the inference backend up and running.

ation about how some parts of the backend is implemented can be found at Developer Documentation

About Nitro

Repo Structure

.
|-- core
|   |-- inference_backend
|   |   |-- controllers
|   |   |   |-- img2img
|   |   |   |-- llm_models
|   |   |   `-- txt2img
|   |   |-- include
|   |   |-- schemas
|   |   `-- test
|   |-- models
|   `-- scripts
`-- docs
    |-- development
    `-- openapi

Architecture

Current architecture

Contributing

Contributions are welcome! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.

Please note that Jan intends to build a sustainable business that can provide high quality jobs to its contributors. If you are excited about our mission and vision, please contact us to explore opportunities.

Contact

  • For support: please file a Github ticket
  • For questions: join our Discord here
  • For long form inquiries: please email [email protected]

About

Accelerated AI Inference Server written in C++. Runs on consumer-grade hardware to datacenter-grade GPUs. OpenAI compatible API. StableDiffusion compatible API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 87.9%
  • Dockerfile 6.8%
  • CMake 4.7%
  • Shell 0.6%