MM-Vid: Advancing Video Understanding with GPT-4V(ision)

This repository contains the open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".

Overview

The goal of this project is to advance video understanding by leveraging the capabilities of GPT-4V(ision). The implementation follows the methodologies and experiments described in the paper, providing a comprehensive framework for scene detection, video clipping, speech recognition, and generating coherent video descriptions.

Installation

To use this repository, first clone the repository and install the required dependencies.

git clone https://github.com/yongliang-wu/MM-VID.git
cd MM-VID
pip install -r requirements.txt

Then run the code

python main.py

TODO

The input of external information is not supported yet.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.ipynb		main.ipynb
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MM-Vid: Advancing Video Understanding with GPT-4V(ision)

Overview

Installation

TODO

About

Releases

Packages

Languages

License

veerasit-ka/MM-VID

Folders and files

Latest commit

History

Repository files navigation

MM-Vid: Advancing Video Understanding with GPT-4V(ision)

Overview

Installation

TODO

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages