Skip to content
View litwellchi's full-sized avatar

Block or report litwellchi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model

Python 332 13 Updated Oct 7, 2024

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

Python 748 78 Updated Jul 29, 2024

This repository compiles a list of papers related to the application of video technology in the field of robotics! Star⭐ the repo and follow me if you like what you see🤩.

117 6 Updated Aug 12, 2024

Training code for the videocrafter.

Python 4 Updated May 27, 2024

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

4 Updated Aug 7, 2024

A curated list of Diffusion Model in RL resources (continually updated)

786 42 Updated Oct 10, 2024

world modeling challenge for humanoid robots

Python 329 21 Updated Aug 23, 2024

Pointcept: a codebase for point cloud perception research. Latest works: PTv3 (CVPR'24 Oral), PPT (CVPR'24), OA-CNNs (CVPR'24), MSC (CVPR'23)

Python 1,565 169 Updated Sep 7, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 21,913 2,134 Updated Aug 9, 2024

A curated list of awesome papers on Embodied AI and related research/industry-driven resources.

272 9 Updated Jul 26, 2024

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods…

Python 234 8 Updated Oct 7, 2024

[Arxiv 2024] Official code for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

22 1 Updated Sep 2, 2024

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Jupyter Notebook 429 16 Updated Oct 16, 2024

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Python 20 2 Updated Sep 21, 2024

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Python 380 56 Updated Sep 1, 2024

An open-source framework for training large multimodal models.

Python 3,703 281 Updated Aug 31, 2024

Pandora: Towards General World Model with Natural Language Actions and Video States

Python 471 33 Updated Sep 23, 2024
Python 522 24 Updated Oct 16, 2024

Video datasets

1,165 92 Updated Mar 8, 2023

data pipeline code of large video generation model

Python 7 Updated Sep 2, 2024

[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Python 123 6 Updated Jul 6, 2024

Stable Video Diffusion Training Code and Extensions.

Python 583 57 Updated Jul 25, 2024

Latte: Latent Diffusion Transformer for Video Generation.

Python 1,669 177 Updated Sep 28, 2024

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Jupyter Notebook 513 28 Updated Oct 6, 2024

The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation

Python 190 7 Updated Sep 3, 2023

Official JAX implementation of MAGVIT: Masked Generative Video Transformer

Python 947 42 Updated Jan 17, 2024

Implementation of MagViT2 Tokenizer in Pytorch

Python 554 34 Updated Oct 14, 2024

Official implementation of AnimateDiff.

Python 10,432 859 Updated Jul 31, 2024
Python 32 1 Updated Jul 5, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 326 17 Updated Oct 17, 2024
Next