-
Tencent
- Shenzhen, China
- https://xinntao.github.io/
Block or Report
Block or report xinntao
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
A native PyTorch Library for large model training
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Translate PDF, EPub, webpage, metadata, annotations, notes to the target language. Support 20+ translate services.
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫
[CVPR 2024] Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Pix2Seq codebase: multi-tasks with generative modeling (autoregressive and diffusion)
A simple HTML visualization tool for computer vision research 🛠️
Transparent Image Layer Diffusion using Latent Transparency
[ECCV 2024] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Official Code for MotionCtrl [SIGGRAPH 2024]
Official code of SmartEdit [CVPR-2024 Highlight]
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Easily create large video dataset from video urls
A lightweight tool for camera pose visualization
(CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.
Official implementation of SEED-LLaMA (ICLR 2024).
Implementation of “DreamDiffusion: Generating High-Quality Images from Brain EEG Signals”
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models
A unified framework for 3D content generation.
General video interaction platform based on LLMs, including Video ChatGPT
GPT4Tools is an intelligent system that can automatically decide, control, and utilize different visual foundation models, allowing the user to interact with images during a conversation.