Skip to content
View x2ss's full-sized avatar
  • Shanghai

Block or report x2ss

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[NeurIPS 24] PromptFix: You Prompt and We Fix the Photo

Python 350 13 Updated Oct 4, 2024
Python 259 15 Updated Nov 5, 2024

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Python 1,339 417 Updated Nov 7, 2024

Bridging Large Vision-Language Models and End-to-End Autonomous Driving

147 4 Updated Oct 31, 2024

Get your documents ready for gen AI

Python 7,638 364 Updated Nov 9, 2024

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…

Jupyter Notebook 7,413 550 Updated Nov 1, 2024

[NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Python 718 36 Updated Oct 25, 2024

Fast and accurate automatic speech recognition (ASR) for edge devices

Python 2,105 90 Updated Nov 5, 2024

The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"

Python 602 35 Updated Oct 31, 2024

D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥

Python 655 58 Updated Nov 6, 2024

the official code for "ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases"

Python 857 39 Updated Oct 26, 2024

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Python 917 41 Updated Oct 31, 2024

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Python 426 29 Updated Oct 31, 2024

Efficient vision foundation models for high-resolution generation and perception.

Python 2,323 185 Updated Nov 3, 2024

Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.

Python 556 40 Updated Oct 31, 2024

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,674 514 Updated Nov 6, 2024

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Python 421 36 Updated Nov 7, 2024

Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen

326 18 Updated Oct 19, 2024

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,515 181 Updated Nov 6, 2024

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 13,903 863 Updated Nov 7, 2024

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,812 792 Updated Nov 8, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 317 14 Updated Oct 16, 2024

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 348 14 Updated Oct 31, 2024

COCO API - Dataset @ https://cocodataset.org/

Jupyter Notebook 6,099 3,757 Updated Apr 17, 2024

Image composition toolbox: everything you want to know about image composition or object insertion

Python 531 32 Updated Oct 31, 2024

Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think

Python 632 29 Updated Nov 8, 2024

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 7,699 868 Updated Nov 7, 2024

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 3,603 236 Updated Oct 5, 2024
Next