-
Peking University
-
09:54
(UTC +08:00) - https://jpthu17.github.io/
Stars
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
A collection of awesome video generation studies.
Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official code release for the paper "SkillMimic: Learning Reusable Basketball Skills from Demonstrations"
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
[CVPR 2024] Code release for TransNeXt model
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Official implementation of Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Fast and memory-efficient exact attention
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"
A pipeline to improve skills of large language models
OmniTokenizer: one model and one weight for image-video joint tokenization.
LLMBind: A Unified Modality-Task Integration Framework
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
A curated list of reinforcement learning with human feedback resources (continually updated)