Stars
[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory
Train transformer language models with reinforcement learning.
This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompts. And it's also powered by additional prompt refining featu…
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Inpaint Anything extension performs stable diffusion inpainting on a browser UI using masks from Segment Anything.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
[NeurIPS 2024] Official Implementation of CLIPAway
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
[ICCV 2023 Oral] Text-to-Image Diffusion Models are Zero-Shot Video Generators
Official Implementation of SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior.
A python library for self-supervised learning on images.
Refine high-quality datasets and visual AI models
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization
[ECCV‘24] Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation 🔥
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Official PyTorch and Diffusers Implementation of "LinFusion: 1 GPU, 1 Minute, 16K Image"
ACM MM2024 (oral): Timeline and Boundary Guided Diffusion Network for Video Shadow Detection
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation