GitHub - Leda06/server: Real-Time Multimodal Pipelines for GenAI

Sign Up | Documentation | Email List

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Mixpeek listens in on changes to your database then processes each change (file_url or inline) through an inference pipeline of: extraction, generation and embedding leaving your database with fresh multimodal data always.

It removes the need of setting up architecture to track database changes, extracting content, processing and embedding it then treating each change as its' own atomic unit

We support every modality: documents, images, video, audio and of course text.

Integrations

MongoDB: https://docs.mixpeek.com/integrations/mongodb

Architecture

Mixpeek is structured into two main services, each designed to handle a specific part of the process:

API Orchestrator: Coordinates the flow between services, ensuring smooth operation and handling failures gracefully.
Distributed Queue:
Inference Service: Extraction, embedding, and generation of payloads

These services are containerized and can be deployed on separate servers for optimal performance and scalability.

Getting Started

Clone the Mixpeek repository and navigate to the SDK directory:

git clone [email protected]:mixpeek/server.git
cd server

We use poetry for all services, but there is an optional Dockerfile in each. We'll use poetry in the setup for quick deployment.

Setup

For each service you'll do the following:

Create a virtual environment

poetry env use python3.10

Activate the virtual environment

poetry shell

Install the requirements

poetry install

API

.env file:

SERVICES_CONTAINER_URL=https://localhost:8001
PYTHON_VERSION=3.11.6
OPENAI_KEY=
ENCRYPTION_KEY=

REDIS_URL=

MONGO_URL=
MONGODB_ATLAS_PUBLIC_KEY=
MONGODB_ATLAS_PRIVATE_KEY=
MONGODB_ATLAS_GROUP_ID=

AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
AWS_ARN_LAMBDA=

MIXPEEK_ADMIN_TOKEN=

Then run it:

poetry run python3 -m uvicorn main:app --reload

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
HISTORY.md		HISTORY.md
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
old_README.md		old_README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Integrations

Architecture

Getting Started

Setup

API

License

Leda06/server

Folders and files

Latest commit

History

Repository files navigation

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Integrations

Architecture

Getting Started

Setup

API