Block or Report
Block or report FlyingRoastDuck
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
[CVPR 2024] Official repo for "InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model".
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, T2I-Adapter, IP-Adapter.
Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models
PartCraft: Crafting Creative Objects by Parts (ECCV2024)
Official code for paper: Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Official Pytorch Implementation for “Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation” (CVPR 2023)
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Official PyTorch implementation of "A Unified Approach for Text- and Image-guided 4D Scene Generation", [CVPR 2024]
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Open-Sora: Democratizing Efficient Video Production for All
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)
FlashInfer: Kernel Library for LLM Serving
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
A comprehensive guide to building RAG-based LLM applications for production.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
Recent LLM-based CV and related works. Welcome to comment/contribute!
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.