This project leverages the advanced capabilities of the MiniCPM-V (i.e., OmniLMM-3B), to bring cutting-edge image recognition to real-time camera feeds. By harnessing the power of this model, the application can analyze and understand scenes captured by the camera, providing instant feedback on what it perceives. You can modify the prompt to see how the model responds to different inputs.
-
Download the MiniCPM-V from the model1 and model2 links. And put them in the
MiniCPM-V
folder. -
Install the requirements by running
pip install -r requirements.txt
. -
To start the image recognition application, use the run.sh script with one of the following device options: mps, cpu, or cuda. For example:
./run.sh mps # For running on Apple Silicon GPU ./run.sh cpu # For running on CPU ./run.sh cuda # For running on CUDA-enabled GPU
Ensure you have given execution permissions to the script by running
chmod +x run.sh
if necessary. -
Quit the application by pressing
q
.