-
Australian National University
- Canberra, Australia
- https://1jsingh.github.io
Stars
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Enhancing AI Software Engineering with Repository-level Code Graph
Efficient vision foundation models for high-resolution generation and perception.
CLIP+MLP Aesthetic Score Predictor
Agentless🐱: an agentless approach to automatically solve software development problems
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Codebase for Aria - an Open Multimodal Native MoE
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Evaluating text-to-image/video/3D models with VQAScore
Unofficial implementation of the paper "The Chosen One: Consistent Characters in Text-to-Image Diffusion Models"
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
Mora: More like Sora for Generalist Video Generation
Official inference repo for FLUX.1 models
SEED-Story: Multimodal Long Story Generation with Large Language Model
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Implementation of MagViT2 Tokenizer in Pytorch
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.