Skip to content
View jpthu17's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@PKU-YuanGroup

Block or report jpthu17

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Next-Token Prediction is All You Need

Python 406 6 Updated Sep 28, 2024

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

3,878 213 Updated Sep 27, 2024

A collection of awesome video generation studies.

TeX 279 7 Updated Sep 28, 2024

Official implementation of "ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis"

Python 741 24 Updated Sep 23, 2024

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

146 1 Updated Sep 9, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 866 39 Updated Sep 26, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,354 131 Updated Sep 24, 2024

Official code release for the paper "SkillMimic: Learning Reusable Basketball Skills from Demonstrations"

Python 152 9 Updated Sep 17, 2024

主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识

HTML 66 1 Updated May 12, 2024

Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval"

Python 10 Updated Sep 8, 2024

Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"

Python 471 19 Updated Aug 16, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 793 40 Updated Sep 22, 2024

[CVPR 2024] Code release for TransNeXt model

Python 382 15 Updated Jun 13, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,775 108 Updated Jul 29, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 11,078 940 Updated Aug 21, 2024

Official implementation of Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

173 7 Updated Aug 10, 2024

Fast and memory-efficient exact attention

Python 13,570 1,244 Updated Sep 28, 2024

Bring portraits to life!

Python 12,012 1,259 Updated Sep 6, 2024

Kolors Team

Python 3,634 237 Updated Sep 4, 2024
CSS 3 Updated Sep 28, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,695 112 Updated Sep 19, 2024

[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

Python 168 14 Updated Sep 28, 2024

Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

Python 65 3 Updated Sep 19, 2024

A pipeline to improve skills of large language models

Python 151 33 Updated Sep 29, 2024

OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 231 5 Updated Jul 9, 2024

LLMBind: A Unified Modality-Task Integration Framework

Python 14 2 Updated Jun 16, 2024

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,214 48 Updated Aug 15, 2024
32 Updated Jun 19, 2024

DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support

Python 405 41 Updated Mar 22, 2024

A curated list of reinforcement learning with human feedback resources (continually updated)

3,275 202 Updated Aug 30, 2024
Next