Stars
A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
A library for efficient similarity search and clustering of dense vectors.
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
GPT4V-level open-source multi-modal model based on Llama3-8B
Real-time image and video processing library similar to GPUImage, with built-in beauty filters, achieving commercial-grade beauty effects. Written in C++11 and based on OpenGL/ES.
The Places365-CNNs for Scene Classification
👁️ 🖼️ 🔥PyTorch Toolbox for Image Quality Assessment, including LPIPS, FID, NIQE, NRQM(Ma), MUSIQ, TOPIQ, NIMA, DBCNN, BRISQUE, PI and more...
A Python package to stabilize videos using OpenCV
Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI
Perceptual video quality assessment based on multi-method fusion.
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Rich is a Python library for rich text and beautiful formatting in the terminal.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
The open collection of GL Transitions
A generative speech model for daily dialogue.
DSPy: The framework for programming—not prompting—foundation models
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.