-
GLM-4-Voice Public
Forked from THUDM/GLM-4-VoiceGLM-4-Voice | 端到端中英语音对话模型
Python Apache License 2.0 UpdatedNov 9, 2024 -
docling Public
Forked from DS4SD/doclingGet your documents ready for gen AI
Python MIT License UpdatedNov 9, 2024 -
doraemon-nb Public
ipython notebooks do some sample experiments , make some idea
-
hertz-dev Public
Forked from Standard-Intelligence/hertz-devfirst base model for full-duplex conversational audio
Python Apache License 2.0 UpdatedNov 8, 2024 -
Freeze-Omni Public
Forked from VITA-MLLM/Freeze-Omni✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM;麻烦请快点,迫不及待想学!
UpdatedNov 7, 2024 -
mini-omni2 Public
Forked from gpt-omni/mini-omni2Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Python MIT License UpdatedNov 6, 2024 -
pipecat Public
Forked from pipecat-ai/pipecatOpen Source framework for voice and multimodal conversational AI
Python BSD 2-Clause "Simplified" License UpdatedNov 6, 2024 -
open-interpreter Public
Forked from OpenInterpreter/open-interpreterA natural language interface for computers
Python GNU Affero General Public License v3.0 UpdatedNov 6, 2024 -
n8n Public
Forked from n8n-io/n8nFree and source-available fair-code licensed workflow automation tool. Easily automate tasks across different services.
TypeScript Other UpdatedNov 6, 2024 -
ichigo Public
Forked from homebrewltd/ichigoLlama3.1 learns to Listen ; 复现训练过程!
Python Apache License 2.0 UpdatedNov 5, 2024 -
apipeai Public
Multimodal Content pipe to Multimodal Content with AI , a big idea
UpdatedOct 29, 2024 -
VITA Public
Forked from VITA-MLLM/VITA✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM。主要是了解下训练过程。
Python Other UpdatedOct 24, 2024 -
swarm Public
Forked from openai/swarmFramework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework.
Python MIT License UpdatedOct 12, 2024 -
podcastfy Public
Forked from souzatharsis/podcastfyTransforming Multi-Sourced Text into Captivating Multi-Lingual Audio Conversations with GenAI
Python Other UpdatedOct 11, 2024 -
GOT-OCR2.0 Public
Forked from Ucas-HaoranWei/GOT-OCR2.0Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Python UpdatedOct 6, 2024 -
mediapipe Public
Forked from google-ai-edge/mediapipeCross-platform, customizable ML solutions for live and streaming media.
C++ Apache License 2.0 UpdatedOct 4, 2024 -
sapiens Public
Forked from facebookresearch/sapiensHigh-resolution models for human tasks.
Python Other UpdatedOct 3, 2024 -
RT-DETR Public
Forked from lyuwenyu/RT-DETR[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Python Apache License 2.0 UpdatedSep 27, 2024 -
ML-DL-note Public
keyword and algorithm about ML, DL on text, audio, vision case
UpdatedSep 27, 2024 -
vosk-api Public
Forked from alphacep/vosk-apiOffline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Jupyter Notebook Apache License 2.0 UpdatedSep 26, 2024 -
-
Qwen-Agent Public
Forked from QwenLM/Qwen-AgentAgent framework and applications built upon Qwen2.x, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.
Python Other UpdatedSep 18, 2024 -
speech-to-speech Public
Forked from huggingface/speech-to-speech…
Python Apache License 2.0 UpdatedSep 13, 2024 -
minimind Public
Forked from jingyaogong/minimind【大模型】3小时完全从0训练一个仅有26M的小参数GPT,最低仅需2G显卡即可推理训练!
Python Apache License 2.0 UpdatedSep 12, 2024 -
-
-
mini-omni Public
Forked from gpt-omni/mini-omniopen-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Python MIT License UpdatedSep 5, 2024 -
Qwen2-VL Public
Forked from QwenLM/Qwen2-VLQwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Python UpdatedSep 5, 2024 -
agents Public
Forked from livekit/agentsBuild real-time multimodal AI applications 🤖🎙️📹
Python Apache License 2.0 UpdatedAug 30, 2024 -
livekit Public
Forked from livekit/livekitEnd-to-end stack for WebRTC. SFU media server and SDKs.
Go Apache License 2.0 UpdatedAug 30, 2024