Skip to content
View secutron's full-sized avatar
💭
I may be slow to respond.
💭
I may be slow to respond.

Block or report secutron

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Build real-time multimodal AI applications 🤖🎙️📹

Python 3,424 336 Updated Oct 16, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,385 150 Updated Sep 24, 2024
Python 4 Updated Oct 13, 2024

🐮📢 The first AI voice assistant that interrupts *you*

Python 127 7 Updated Sep 6, 2024

Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 2,658 312 Updated Aug 15, 2024

LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances human-computer interaction through real-time spoken dialogue…

Python 36 3 Updated Oct 2, 2024

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,637 251 Updated Aug 9, 2024

LivePortrait is an advanced deep learning-based system for animating portrait images. It uses a two-stage training process to create realistic and controllable animations from static portrait images.

Python 8 3 Updated Oct 13, 2024

Bring portraits to life!

Python 12,346 1,307 Updated Oct 7, 2024

The official Meta Llama 3 GitHub site

Python 26,678 3,018 Updated Aug 12, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 203 26 Updated Aug 15, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,324 1,281 Updated Sep 14, 2024

Inquisitive Parrots for Search

Python 177 18 Updated Feb 29, 2024

RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker

Python 104 8 Updated Sep 12, 2024

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 185 30 Updated Jun 25, 2024

[NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to improve performance of numerous vision language performances f…

Python 99 4 Updated May 30, 2024

🥷 Run AI-agents with an API

TypeScript 5,173 835 Updated Jul 22, 2024

using chatgpt (now Claude 3) to reverse engineer code from Emote white paper. WIP

Python 1 Updated Mar 29, 2024

FFmpeg libav tutorial - learn how media works from basic to transmuxing, transcoding and more. Translations: 🇺🇸 🇨🇳 🇰🇷 🇪🇸 🇻🇳 🇧🇷

C 9,951 958 Updated Oct 15, 2024

[CVPR] MARLIN: Masked Autoencoder for facial video Representation LearnINg

Python 225 20 Updated May 9, 2024

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7,461 902 Updated Aug 21, 2024

NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction (WACV24)

Python 13 2 Updated Jul 17, 2024

🔍 Explore Egocentric Vision: research, data, challenges, real-world apps. Stay updated & contribute to our dynamic repository! Work-in-progress; join us!

76 6 Updated Jul 8, 2024

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset

Jupyter Notebook 52 9 Updated Nov 23, 2020

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Python 12,144 2,052 Updated Oct 12, 2024

Low latency ai companion voice talk in 60 lines of code using faster_whisper and elevenlabs input streaming

Python 244 47 Updated Jun 8, 2024

A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.

Python 1,843 172 Updated Oct 15, 2024

Converts text to speech in realtime

Python 1,871 176 Updated Oct 12, 2024

vits2 backbone with multilingual-bert(한국어 지원)

Python 25 1 Updated Apr 6, 2024

🎤📄 An innovative tool that transforms audio or video files into text transcripts and generates concise meeting minutes. Stay organized and efficient in your meetings, and get ready for Phase 2 wher…

Python 98 11 Updated Jun 10, 2024
Next