Starred repositories
Code for the paper "Training Diffusion Models with Reinforcement Learning"
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
A new kind of Progress Bar, with real-time throughput, ETA, and very cool animations!
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
[EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
This is the official repo for the paper "Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization" Tang et al
Material for the "Probabilistic Machine Learning" Course at the University of Tübingen, Summer Term 2023
Code for ALBEF: a new vision-language pre-training method
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
a state-of-the-art-level open visual language model | 多模态预训练模型
Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)
The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Simple implementation of OpenAI CLIP model in PyTorch.
An open source implementation of CLIP.
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021