GitHub - EmmS21/server: Real-Time Multimodal ETL Pipelines for GenAI

Sign Up | Documentation | Email List

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Mixpeek listens in on changes to your database then processes each change (file_url or inline) through an inference pipeline of: extraction, generation and embedding leaving your database with fresh multimodal data always.

It removes the need of setting up architecture to track database changes, extracting content, processing and embedding it then treating each change as its' own atomic unit

We support every modality: documents, images, video, audio and of course text.

Integrations

MongoDB: https://docs.mixpeek.com/integrations/mongodb

Architecture

Mixpeek is structured into two main services, each designed to handle a specific part of the process:

API Orchestrator: Coordinates the flow between services, ensuring smooth operation and handling failures gracefully.
Distributed Queue:
Inference Service: Extraction, embedding, and generation of payloads

These services are containerized and can be deployed on separate servers for optimal performance and scalability.

Getting Started

Clone the Mixpeek repository and navigate to the SDK directory:

git clone [email protected]:mixpeek/server.git
cd server

We use poetry for all services, but there is an optional Dockerfile in each. We'll use poetry in the setup for quick deployment.

Setup

For each service you'll do the following:

Create a virtual environment

poetry env use python3.10

Activate the virtual environment

poetry shell

Install the requirements

poetry install

API

.env file:

SERVICES_CONTAINER_URL=http:https://localhost:8001
PYTHON_VERSION=3.11.6
OPENAI_KEY=
ENCRYPTION_KEY=

REDIS_URL=

MONGO_URL=
MONGODB_ATLAS_PUBLIC_KEY=
MONGODB_ATLAS_PRIVATE_KEY=
MONGODB_ATLAS_GROUP_ID=

AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
AWS_ARN_LAMBDA=

MIXPEEK_ADMIN_TOKEN=

Then run it:

poetry run python3 -m uvicorn main:app --reload

Inference Service

.env file:

S3_BUCKET=
AWS_ACCESS_KEY=
AWS_SECRET_KEY=
AWS_REGION=
PYTHON_VERSION=

poetry run python3 -m uvicorn main:app --reload --host 0.0.0.0 --port 8001

Distributed Queue

Also runs inside the api folder and uses the same .env file as api

celery -A db.service.celery_app worker --loglevel=info

You now have 3 services running !

API Interface

All methods are exposed as HTTP endpoints.

API swagger: https://api.mixpeek.com/docs/openapi.json
API Documentation: https://docs.mixpeek.com
Python SDK: https://github.com/mixpeek/mixpeek-python

You'll first need to generate an api key via POST /user Use the MIXPEEK_ADMIN_TOKEN you defined in the api env file.

curl --location 'http:https://localhost:8000/users/private' \
--header 'Authorization: MIXPEEK_ADMIN_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{"email":"[email protected]"}'

You can use any email, doesn't matter

Cloud Service

If you want a completely managed version of Mixpeek: https://mixpeek.com/start

We also have a transparent and predictible billing model: https://mixpeek.com/pricing

Are we missing anything?

Email: [email protected]
Meeting: https://mixpeek.com/contact

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
.github/workflows		.github/workflows
examples		examples
src		src
.gitignore		.gitignore
AUTHORS.md		AUTHORS.md
HISTORY.md		HISTORY.md
LICENSE.txt		LICENSE.txt
README.md		README.md
docker-compose.yml		docker-compose.yml
old_README.md		old_README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Integrations

Architecture

Getting Started

Setup

API

Inference Service

Distributed Queue

API Interface

Cloud Service

Are we missing anything?

About

Releases

Packages

Contributors 2

Languages

License

EmmS21/server

Folders and files

Latest commit

History

Repository files navigation

Real-Time Multimodal ETL Pipelines for GenAI

Overview

Integrations

Architecture

Getting Started

Setup

API

Inference Service

Distributed Queue

API Interface

Cloud Service

Are we missing anything?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages