Block or Report
Block or report jwwangchn
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (7)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Vision utilities for web interaction agents 👀
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
CosmicMan: A Text-to-Image Foundation Model for Humans (CVPR 2024)
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Code for the paper "Query-Key Normalization for Transformers"
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
A curated list of foundation models for vision and language tasks
A cross-platform GUI automation Python module for human beings. Used to programmatically control the mouse & keyboard.
official implementation of VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
transformer xl在中文文本生成上的尝试(可写小说、古诗)(transformer xl for text generation of chinese)
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Stable Diffusion with Core ML on Apple Silicon
[CVPR 2024] 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
This is a list of awesome paper about optical flow and related work.
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Fine-Grained Open Domain Image Animation with Motion Guidance