Stars
Streaming ASR and TTS based on FastAPI+ sherpa-onnx
MemFree - Hybrid AI Search Engine & AI Page Generator
Controllable and fast Text-to-Speech for over 7000 languages!
An open source AI wearable device that captures what you say and hear in the real world and then transcribes and stores it on your own server. You can then chat with Adeus using the app, and it wil…
React app for inspecting, building and debugging with the Realtime API
esp32 based device, mainly used for voice chat with large language models
API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition, and speaker verification.
ASR using OpenAI capability API `v1/audio/transcriptions` like Groq, SiliconFlow
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
本项目使用esp32、esp32s3接入讯飞星火、豆包、通义千问(智能体应用)、Chatgpt等大模型,实现语音对话聊天功能,支持语音唤醒、连续对话、音乐播放等功能,同时外接了一块显示屏实时显示对话的内容。
Build real-time multimodal AI applications 🤖🎙️📹
Create Beautiful Resume use Claude Artifacts. AI 智能简历
Wiseflow is an agile information mining tool that extracts concise messages from various sources such as websites, WeChat official accounts, social platforms, etc. It automatically categorizes and …
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
《照片修复小小助手》是一款基于微信AI能力的微信小程序,实现了图片选定区域的消除修复功能,纯客户端实现,无服务端。Inpaint_wechat is a WeChat mini-program based on the WeChat AI capabilities, implementing the functionality of inpainting and repairing sele…
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
[CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation
Not only automatic, but also intelligent. An Intelligent data Visualization System, based on LLM.
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing. It enables customizable human image generation with flexible garment, pose, and scene control, ensuring high fidelity an…
Educational language-learning app for Hokkien, a low-resource language, featuring flashcards, quizzes, and generative AI!
ComfyUI nodes to use segment-anything-2
real time face swap and one-click video deepfake with only a single image
An open-source remote desktop application designed for self-hosting, as an alternative to TeamViewer.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.