- Boston, USA
- https://kunpengli1994.github.io/
Stars
Kandinsky 2 — multilingual text2image latent diffusion model
[CVPR 2023] Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolatio
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024
Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)
Unified Controllable Visual Generation Model
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
WebUI extension for ControlNet
A Close Look at Spatial Modeling: From Attention to Convolution
This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described in the paper.
PyTorch code for ICCV'19 paper "Visual Semantic Reasoning for Image-Text Matching"
[NeurIPS-2021] Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation
📚 A collection of Deep Learning based Image Colorization and Video Colorization papers.
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch
[TPAMI 2023] Generative Multi-Label Zero-Shot Learning
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021 (Oral)
Code accompanying EGO-TOPO: Environment Affordances from Egocentric Video (CVPR 2020)
PyTorch code for the CVPR'2020 paper "Screencast Tutorial Video Understanding"
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
Code and Resources for the Transformer Encoder Reasoning and Alignment Network (TERAN), accepted for publication in ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
The official implementation of CFBI(+): Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A series of basic algorithms that are useful for video understanding, including Single Object Tracking (SOT), Video Object Segmentation (VOS) and so on.
Global Reasoning module for visual recognition
PyTorch code for ICLR 2021 paper Unbiased Teacher for Semi-Supervised Object Detection