Block or Report
Block or report DWCTOD
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Generate Color Palette from your images using Kmeans and DBSCAN
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser
Object Detection Model for Scanned Documents
This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
✨✨Latest Advances on Multimodal Large Language Models
[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列
A tool to perform K-means clustering analysis of the colors in an image.
🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
🔧 Repair JSON!Solution for JSON Anomalies from LLMs.
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)
Image Composition via Stable Diffusion
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
Fine-Grained Subject-Specific Attribute Expression Control in T2I Models