-
UIUC
- Champaign, Illinois
- https://mikewangwzhl.github.io/
- @zhenhailongW
Highlights
- Pro
Stars
Efficient vision foundation models for high-resolution generation and perception.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Implementation of Autoregressive Diffusion in Pytorch
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Repo for paper: https://arxiv.org/abs/2404.06479
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
maze datasets for investigating OOD behavior of ML systems
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).
Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"
[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models