Skip to content
View BonitoW's full-sized avatar
🌴
On vacation
🌴
On vacation
Block or Report

Block or report BonitoW

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The datasets for image emotion computing

16 1 Updated Apr 16, 2022

Dataset for "Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution" in IJCAI 17

22 Updated May 8, 2023

Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

80 2 Updated Jun 13, 2024

Pytorch implemention of Deep CNN Encoder + LSTM Decoder with Attention for Image to Latex

Python 173 47 Updated Oct 3, 2023
Python 256 7 Updated Jan 27, 2024

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 41,223 7,556 Updated Jul 24, 2024

[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Python 30 1 Updated Mar 18, 2024
HTML 60 6 Updated May 10, 2024

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 8,060 567 Updated Jul 25, 2024

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

280 8 Updated Jul 3, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,180 2,900 Updated Apr 22, 2024

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 643 19 Updated Jun 13, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,828 718 Updated Jul 25, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,421 334 Updated May 28, 2024

Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".

JavaScript 7 Updated Jun 4, 2024

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Python 2,032 158 Updated Apr 5, 2024

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,148 106 Updated Jul 19, 2024

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,296 921 Updated Jul 17, 2024

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Python 2,061 131 Updated Jun 7, 2023

[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"

Python 46 3 Updated Jun 20, 2024

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 283 8 Updated Jul 11, 2024

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Python 26,607 5,296 Updated Jul 25, 2024

LLaVA-Interactive-Demo

Python 339 25 Updated Jun 10, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,231 369 Updated Apr 9, 2024

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 14,303 1,318 Updated Jul 16, 2024

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 45,745 5,415 Updated Jun 24, 2024

An open-source framework for training large multimodal models.

Python 3,593 274 Updated May 25, 2024

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 479 30 Updated Jan 7, 2024

A collection of 1000+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML).

1,968 237 Updated Mar 31, 2024
Next