Stars
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"
StoryMaker: Towards consistent characters in text-to-image generation
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
🔥 🔥 🔥 Open Source JIRA, Linear, Monday, and Asana Alternative. Plane helps you track your issues, epics, and product roadmaps in the simplest way possible.
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
One-click Face Swapper and Restoration powered by insightface 🔥
A high resolution face dataset for face editing purpose
face-to-sticker
Collection of awesome medical dataset resources.
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
The official code for paper "parallel speculative decoding with adaptive draft length."
Run macOS VM in a Docker! Run near native OSX-KVM in Docker! X11 Forwarding! CI/CD for OS X Security Research! Docker mac Containers.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
FlashInfer: Kernel Library for LLM Serving
截屏 离线OCR 搜索翻译 以图搜图 贴图 录屏 万向滚动截屏 屏幕翻译 Screenshot Offline OCR Search Translate Search for picture Paste the picture on the screen Screen recorder Omnidirectional scrolling screenshot Screen translator
SGLang is a fast serving framework for large language models and vision language models.