Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 41,223 7,556 Updated Jul 24, 2024

SooLab / DDCOT

[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models

Python 30 1 Updated Mar 18, 2024

ggg0919 / cantor

HTML 60 6 Updated May 10, 2024

OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 8,060 567 Updated Jul 25, 2024

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

280 8 Updated Jul 3, 2024

Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,180 2,900 Updated Apr 22, 2024

IDEA-Research / Grounding-DINO-1.5-API

API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series

Python 643 19 Updated Jun 13, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,828 718 Updated Jul 25, 2024

QwenLM / Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,421 334 Updated May 28, 2024

THUNLP-MT / CODIS

Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".

JavaScript 7 Updated Jun 4, 2024

X-PLUG / mPLUG-Owl

mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model

Python 2,032 158 Updated Apr 5, 2024

UX-Decoder / Semantic-SAM

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,148 106 Updated Jul 19, 2024

salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 9,296 921 Updated Jul 17, 2024

fudan-zvg / Semantic-Segment-Anything

Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).

Python 2,061 131 Updated Jun 7, 2023

chancharikmitra / CCoT

[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"

Python 46 3 Updated Jun 20, 2024

AILab-CVC / SEED-Bench

(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.

Python 283 8 Updated Jul 11, 2024

ultralytics / ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Python 26,607 5,296 Updated Jul 25, 2024

LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

Python 339 25 Updated Jun 10, 2024

UX-Decoder / Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,231 369 Updated Apr 9, 2024

IDEA-Research / Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Jupyter Notebook 14,303 1,318 Updated Jul 16, 2024

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 45,745 5,415 Updated Jun 24, 2024

mlfoundations / open_flamingo

An open-source framework for training large multimodal models.

Python 3,593 274 Updated May 25, 2024

penghao-wu / vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Python 479 30 Updated Jan 7, 2024

NiuTrans / ABigSurvey

A collection of 1000+ survey papers on Natural Language Processing (NLP) and Machine Learning (ML).

1,968 237 Updated Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BonitoW

Block or report BonitoW

Stars

haoyev5 / Image-Emotion-Datasets

sunlightsgy / MDDL

Tongji-KGLLM / RAG-Survey

Yushi-Hu / VisualSketchpad

luopeixiang / im2latex

tsb0601 / MMVP

PaddlePaddle / PaddleOCR