Skip to content

Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

License

Notifications You must be signed in to change notification settings

bhimrazy/chat-with-phi-3-vision

Repository files navigation


Chat with Phi 3.5/3 Vision LLMs

Open In Studio
phi-3.5-vision-demo.mp4

Overview

Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

This model enables multi-frame image understanding, image comparison, multi-image summarization/storytelling, and video summarization, which have broad applications in office scenarios.

Getting Started

Follow these steps to set up and run the project:

1. Install Dependencies

Ensure all necessary packages are installed by running:

pip install -r requirements.txt

2. Start the API Server

Launch the API server powered by LitServe:

python server.py

3. Launch the Streamlit App

Start the Streamlit application with the following command:

streamlit run app.py

About

This project is developed and maintained with ❤️ by Bhimraj Yadav.

About

Chat with Phi 3.5/3 Vision LLMs. Phi-3.5-vision is a lightweight, state-of-the-art open multimodal model built upon datasets which include - synthetic data and filtered publicly available websites - with a focus on very high-quality, reasoning dense data both on text and vision.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published