Skip to content
View friedrichor's full-sized avatar

Block or report friedrichor

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

DSPy: The framework for programming—not prompting—foundation models

Python 17,977 1,367 Updated Oct 21, 2024

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 453 26 Updated Oct 12, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 183 9 Updated Sep 16, 2024

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Python 360 9 Updated Sep 2, 2024

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 762 36 Updated Sep 6, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,774 1,038 Updated Oct 14, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

383 12 Updated Jun 18, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,583 129 Updated Oct 20, 2024

Long Context Transfer from Language to Vision

Python 311 16 Updated Aug 26, 2024
Python 114 15 Updated Apr 23, 2024
Python 2,686 209 Updated Oct 16, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,728 113 Updated Sep 19, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 2,063 140 Updated Sep 3, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Python 2,483 154 Updated Oct 10, 2024

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook 227 35 Updated Sep 15, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks

Python 1,201 169 Updated Oct 21, 2024

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Python 139 4 Updated Jul 1, 2024

An open-source implementation for training LLaVA-NeXT.

Python 276 12 Updated Oct 15, 2024

LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft

Python 37 3 Updated Jul 17, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 800 54 Updated Oct 17, 2024

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1,421 73 Updated Oct 9, 2024

🔎 Monitor deep learning model training and hardware usage from your mobile phone 📱

Python 2,035 135 Updated Oct 18, 2024

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 1,815 128 Updated Jul 2, 2024

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Python 29 3 Updated Jul 12, 2024

[Neurips 2023] T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

Python 196 6 Updated Aug 21, 2024

⚡ Dynamically generated stats for your github readmes

JavaScript 69,066 22,772 Updated Oct 18, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Python 2,748 176 Updated Aug 1, 2024

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,559 672 Updated Aug 12, 2024
Jupyter Notebook 756 72 Updated Aug 7, 2024

Multimodal Models in Real World

Jupyter Notebook 393 16 Updated Sep 21, 2024
Next