-
Dalian University of Technology
- Dalian, Liaoning, China
Stars
A general fine-tuning kit geared toward diffusion models.
An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
OpenMMLab Text Detection, Recognition and Understanding Toolbox
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image genera…
OpenMMLab Pre-training Toolbox and Benchmark
Official inference repo for FLUX.1 models
Generative Models by Stability AI
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…
VMamba: Visual State Space Models,code is based on mamba
A collection of resources and papers on Diffusion Models
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Awesome-LLM: a curated list of Large Language Model
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Lumina-T2X is a unified framework for Text to Any Modality Generation
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
FastPillars: A Deployment-friendly Pillar-based 3D Detector
ICLR2024: LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection.
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
State-of-the-art bilingual open-sourced Math reasoning LLMs.
Official Code for Stable Cascade
Mixture-of-Experts for Large Vision-Language Models