Skip to content
View DWCTOD's full-sized avatar
Block or Report

Block or report DWCTOD

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Generate Color Palette from your images using Kmeans and DBSCAN

Python 1 Updated Feb 16, 2021

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,571 96 Updated Jul 6, 2024

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 260 16 Updated Jul 19, 2024

InstructionGPT-4

Python 35 3 Updated Dec 29, 2023

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

472 28 Updated Jul 21, 2024

Dino V2 for Classification, PCA Visualization, Instance Retrival: https://arxiv.org/abs/2304.07193

Jupyter Notebook 135 10 Updated Jul 5, 2023

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 3,438 306 Updated Jul 21, 2024
Python 1,309 71 Updated Jul 19, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

313 11 Updated Jun 18, 2024

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 2,468 225 Updated Jul 21, 2024

通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser

Python 37 6 Updated Jun 13, 2024

Object Detection Model for Scanned Documents

Jupyter Notebook 59 7 Updated Oct 4, 2023

This is an implementation of YOLOv8 and CRNN network for Scene Text Recognition task

Jupyter Notebook 3 Updated Apr 20, 2024
Python 55 2 Updated Jun 20, 2024

[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Python 2,013 121 Updated Jun 25, 2024

基于Pytorch的OCR工具库,支持常用的文字检测和识别算法

Python 1,333 298 Updated Sep 28, 2023

More relighting!

Python 4,327 284 Updated Jun 27, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,756 713 Updated Jul 11, 2024

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,039 92 Updated Jun 13, 2024

A tool to perform K-means clustering analysis of the colors in an image.

Python 22 6 Updated Apr 13, 2021

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。

TypeScript 3,428 540 Updated Jul 12, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 18,277 1,993 Updated Jul 14, 2024

🔧 Repair JSON!Solution for JSON Anomalies from LLMs.

Go 136 6 Updated Jul 17, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,102 277 Updated May 4, 2024

CLiC: Concept Learning in Context

6 Updated Apr 5, 2024

[ICCV 2023] "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition" (Official Implementation)

Python 778 100 Updated Jan 17, 2024

Image Composition via Stable Diffusion

Python 66 11 Updated Mar 10, 2023

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 4,276 325 Updated Jul 21, 2024

Fine-Grained Subject-Specific Attribute Expression Control in T2I Models

Jupyter Notebook 101 9 Updated Jun 13, 2024
Next