Skip to content
/ server Public
forked from EmmS21/server

Real-Time Multimodal Pipelines for GenAI

License

Notifications You must be signed in to change notification settings

Leda06/server

 
 

Repository files navigation

Mixpeek Logo

Sign Up | Documentation | Email List

Github stars GitHub issues Join Slack

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Mixpeek listens in on changes to your database then processes each change (file_url or inline) through an inference pipeline of: extraction, generation and embedding leaving your database with fresh multimodal data always.

It removes the need of setting up architecture to track database changes, extracting content, processing and embedding it then treating each change as its' own atomic unit

We support every modality: documents, images, video, audio and of course text.

Integrations

Architecture

Mixpeek is structured into two main services, each designed to handle a specific part of the process:

  • API Orchestrator: Coordinates the flow between services, ensuring smooth operation and handling failures gracefully.
  • Distributed Queue:
  • Inference Service: Extraction, embedding, and generation of payloads

These services are containerized and can be deployed on separate servers for optimal performance and scalability.

Getting Started

Clone the Mixpeek repository and navigate to the SDK directory:

git clone [email protected]:mixpeek/server.git
cd server

We use poetry for all services, but there is an optional Dockerfile in each. We'll use poetry in the setup for quick deployment.

Setup

For each service you'll do the following:

  1. Create a virtual environment
poetry env use python3.10
  1. Activate the virtual environment
poetry shell
  1. Install the requirements
poetry install

API

.env file:

SERVICES_CONTAINER_URL=https://localhost:8001
PYTHON_VERSION=3.11.6
OPENAI_KEY=
ENCRYPTION_KEY=

REDIS_URL=

MONGO_URL=
MONGODB_ATLAS_PUBLIC_KEY=
MONGODB_ATLAS_PRIVATE_KEY=
MONGODB_ATLAS_GROUP_ID=

AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
AWS_ARN_LAMBDA=

MIXPEEK_ADMIN_TOKEN=

Then run it:

poetry run python3 -m uvicorn main:app --reload