-
SCUT
- Outer Ring West Road, Panyu District, Guangzhou, Guangdong Province, China
-
20:02
(UTC -12:00) - https://www.gdut.edu.cn/
- https://www.gdut.edu.cn/
Highlights
- Pro
Block or Report
Block or report YongLD
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
[EG 2023] Sketch Video Synthesis
Open-Sora: Democratizing Efficient Video Production for All
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
SpeechGPT Series: Speech Large Language Models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Emu Series: Generative Multimodal Models from BAAI
We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tuning) together for easy use. We welcome open-source enthusiasts…
Official repository of Agent Attention (ECCV2024)
Tool Learning for Big Models, Open-Source Solutions of ChatGPT-Plugins
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
🔍 LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your d…
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
A natural language interface for computers
Extension of Langchain for RAG. Easy benchmarking, multiple retrievals, reranker, time-aware RAG, and so on...
A generative and self-guided robotic agent that endlessly propose and master new skills.
real Transformer TeraFLOPS on various GPUs
LAVIS - A One-stop Library for Language-Vision Intelligence