Block or Report
Block or report BonitoW
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Dataset for "Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution" in IJCAI 17
Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
✨✨Latest Advances on Multimodal Large Language Models
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
LAVIS - A One-stop Library for Language-Vision Intelligence
Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
An open-source framework for training large multimodal models.
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
A collection of 1000+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML).