ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 2,468 225 Updated Jul 21, 2024

OKC13 / General-Documents-Layout-parser

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Python 37 6 Updated Jun 13, 2024

LynnHaDo / Document-Layout-Analysis

Object Detection Model for Scanned Documents

Jupyter Notebook 59 7 Updated Oct 4, 2023

minhnhat2001vt / Scence-Text-Recognition-With-YOLOv8-and-CRNN

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Jupyter Notebook 3 Updated Apr 20, 2024

XiaoduoAILab / XmodelVLM

Python 55 2 Updated Jun 20, 2024

IDEA-Research / T-Rex

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Python 2,013 121 Updated Jun 25, 2024

WenmuZhou / PytorchOCR

基于Pytorch的OCR工具库，支持常用的文字检测和识别算法

Python 1,333 298 Updated Sep 28, 2023

lllyasviel / IC-Light

More relighting!

Python 4,327 284 Updated Jun 27, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,756 713 Updated Jul 11, 2024

OpenBMB / VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,039 92 Updated Jun 13, 2024

GigaAI-research / General-World-Models-Survey

210 8 Updated May 7, 2024

thobbs / color-clustering

A tool to perform K-means clustering analysis of the colors in an image.

Python 22 6 Updated Apr 13, 2021

LLM-Red-Team / kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。

TypeScript 3,428 540 Updated Jul 12, 2024

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,277 1,993 Updated Jul 14, 2024

RealAlexandreAI / json-repair

🔧 Repair JSON！Solution for JSON Anomalies from LLMs.

Go 136 6 Updated Jul 17, 2024

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,102 277 Updated May 4, 2024

Mehdi0xC / clic

CLiC: Concept Learning in Context

6 Updated Apr 5, 2024

Shilin-LU / TF-ICON

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Python 778 100 Updated Jan 17, 2024

Reality-Editor / Composition-Stable-Diffusion

Image Composition via Stable Diffusion

Python 66 11 Updated Mar 10, 2023

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 4,276 325 Updated Jul 21, 2024

CompVis / attribute-control

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Jupyter Notebook 101 9 Updated Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zero is not none DWCTOD

Achievements

Achievements

Block or report DWCTOD

Stars

nugujeyong / ColorPalette

cambrian-mllm / cambrian

SkyworkAI / Vitron

waltonfuture / InstructionGPT-4

Yangyi-Chen / Multimodal-AND-Large-Language-Models

purnasai / Dino_V2

InternLM / lmdeploy

LLaVA-VL / LLaVA-NeXT

BradyFU / Video-MME

modelscope / swift