Block or Report
Block or report zhang-tao-whu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Official inference repo for FLUX.1 models
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
This is the official implementation for ControlVAR.
Implementation of Autoregressive Diffusion in Pytorch
recursal / GoldFinch-paper
Forked from SmerkyG/GoldFinch-paperGoldFinch and other hybrid transformer components
Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models' (ICML2024)
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
This is the official implementation of "GvSeg: General and Task-Oriented Video Segmentation" (Accepted at ECCV 2024).
Streamlit — A faster way to build and share data apps.
Code release for "Segment Anything without Supervision"
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
[ECCV 2024] ControlCap: Controllable Region-level Captioning
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
IVGSZ / Flash-VStream
Forked from IVG-SZ/Flash-VStreamThis is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
Point-SAM: This is the official repository of "Point-SAM: Promptable 3D Segmentation Model for Point Clouds". We provide codes for running our demo and links to download checkpoints.
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
The official implement of research paper "MotionBooth: Motion-Aware Customized Text-to-Video Generation"
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).