Multimodal Large Language Models
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
pip install mllm
Some features might require extra dependencies.
For example, for the Gemini models, you can install the extra dependencies like this:
pip install mllm[gemini]
Create an MLLM router with a list of preferred models
import os
from mllm import Router
os.environ["OPENAI_API_KEY"] = "..."
os.environ["ANTHROPIC_API_KEY"] = "..."
os.environ["GEMINI_API_KEY"] = "..."
router = Router(
preference=["gpt-4-turbo", "anthropic/claude-3-opus-20240229", "gemini/gemini-1.5-pro-latest"]
)
Create a new role based chat thread
from mllm import RoleThread
thread = RoleThread(owner_id="[email protected]")
thread.post(role="user", msg="Describe the image", images=["data:image/jpeg;base64,..."])
Chat with the MLLM, store the prompt data in the namespace foo
response = router.chat(thread, namespace="foo")
thread.add_msg(response.msg)
Ask for a structured response
from pydantic import BaseModel
class Animal(BaseModel):
species: str
color: str
thread.post(
role="user",
msg=f"What animal is in this image? Please output as schema {Animal.model_json_schema()}",
images=["data:image/jpeg;base64,..."]
)
response = router.chat(thread, namespace="animal", expect=Animal)
animal_parsed = response.parsed
assert type(animal_parsed) == Animal
Find a saved thread or a prompt
RoleThread.find(id="123")
Prompt.find(id="456)
To store a raw openai prompt
from mllm import Prompt, RoleThread, RoleMessage
thread = RoleThread()
msg = {
"role": "user",
"content": [
{
"type": "text",
"text": "Whats in this image?",
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,..."},
}
]
}
role_message = RoleMessage.from_openai(msg)
thread.add_msg(role_message)
response = call_openai(thread.to_openai())
response_msg = RoleMessage.from_openai(response["choices"][0]["message"])
saved_prompt = Prompt(thread, response_msg, namespace="foo")
Add images of any variety to the thread. We support base64, filepath, PIL, and URL
from PIL import Image
img1 = Image.open("img1.png")
thread.post(
role="user",
msg="Whats this image?",
images=["data:image/jpeg;base64,...", "./img1.png", img1, "https://shorturl.at/rVyAS"]
)
Custom endpoints are supported. They can be added to a Router
instance with the RouterConfig
:
from mllm import RouterConfig
custom_model = RouterConfig(
model="hosted_vllm/allenai/Molmo-7B-D-0924", # needs to have the `hosted_vllm` prefix
api_base="https://hosted-vllm-api.co", # set your api base here
api_key_name="MOLMO_API_KEY" # add the api key name -- this will be searched for in your env
)
router = Router(custom_model)
You can also mix the models:
router = Router([custom_model, "gpt-4-turbo"])
MLLM is integrated with:
- Taskara A task management library for AI agents
- Skillpacks A library to fine tune AI agents on tasks.
- Surfkit A platform for AI agents
- Threadmem A thread management library for AI agents
Come join us on Discord.
Thread and prompt storage can be backed by:
- Sqlite
- Postgresql
Sqlite will be used by default. To use postgres simply configure the env vars:
DB_TYPE=postgres
DB_NAME=mllm
DB_HOST=localhost
DB_USER=postgres
DB_PASS=abc123
Thread image storage by default will utilize the db, to configure bucket storage using GCS:
- Create a bucket with fine grained permissions
- Create a GCP service account JSON with permissions to write to the bucket
export THREAD_STORAGE_SA_JSON='{
"type": "service_account",
...
}'
export THREAD_STORAGE_BUCKET=my-bucket