Skip to content
View MikeWangWZHL's full-sized avatar

Highlights

  • Pro

Block or report MikeWangWZHL

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient vision foundation models for high-resolution generation and perception.

Python 2,251 182 Updated Oct 29, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 279 9 Updated Oct 16, 2024

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

6,722 405 Updated Jul 28, 2024

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Python 2,930 271 Updated Sep 26, 2024

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 966 53 Updated Sep 27, 2024

Implementation of Autoregressive Diffusion in Pytorch

Python 289 8 Updated Sep 26, 2024

Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2

Jupyter Notebook 1,010 95 Updated Nov 3, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

198 6 Updated Oct 31, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,005 43 Updated Oct 27, 2024

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 493 20 Updated Aug 16, 2024

qwqjsq.com 的 最新地址

266 18 Updated Feb 27, 2024

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…

Python 4,068 360 Updated Nov 2, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 668 36 Updated Aug 5, 2024
Python 99 6 Updated Jun 28, 2024

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 677 28 Updated Sep 27, 2024

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 449 18 Updated Oct 16, 2024

Video datasets

1,192 93 Updated Mar 8, 2023

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

540 34 Updated Oct 26, 2024

Repo for paper: https://arxiv.org/abs/2404.06479

Python 25 1 Updated Oct 3, 2024
JavaScript 2,509 889 Updated Jun 21, 2024

Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"

Jupyter Notebook 23 Updated Jun 5, 2024
Python 103 10 Updated Aug 15, 2024

maze datasets for investigating OOD behavior of ML systems

Jupyter Notebook 16 3 Updated Sep 10, 2024

TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data

Python 2,727 338 Updated Nov 1, 2024

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation

Python 1,254 129 Updated Aug 1, 2024
Python 5 Updated Oct 10, 2023

Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (ATP).

Python 48 2 Updated May 29, 2024

Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"

Python 313 28 Updated May 8, 2024

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Python 4,771 445 Updated Jun 22, 2024
Next