LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,215 131 Updated Sep 24, 2024

zhang-tao-whu / DVIS_Plus

Python 89 6 Updated Jul 4, 2024

myshell-ai / DreamVoice

Python 59 6 Updated Aug 26, 2024

cyhuang-tw / robust-vc

Python 11 Updated May 7, 2022

myaxxxxx / speech-to-text-share-adapter

Python 1 Updated Apr 7, 2024

sarulab-speech / spatial_voice_conversion

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

Python 14 1 Updated Aug 8, 2024

qubvel-org / segmentation_models.pytorch

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.

Python 9,490 1,657 Updated Oct 7, 2024

yuval-alaluf / SAM

Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754

Python 628 151 Updated Jan 7, 2024

HasnainRaz / Fast-AgingGAN

A deep learning model to age faces in the wild, currently runs at 60+ fps on GPUs

Python 229 47 Updated May 18, 2024

IDRnD / ReDimNet

The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"

Python 96 5 Updated Sep 3, 2024

ictnlp / NAST-S2x

A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.

Python 60 4 Updated Aug 12, 2024

ictnlp / ComSpeech

Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".

Python 21 5 Updated Jul 2, 2024

ictnlp / CTC-S2UT

Code for ACL 2024 findings paper "CTC-based Non-autoregressive Textless Speech-to-Speech Translation"

8 Updated Jun 11, 2024

Yunusemre EmreOzkose

Highlights

Lists (29)

action_segmentation

aud_class

basics

data

diar

emo_vc

face transform

image-editing

loss

multi-model

nlp

other

rag-kg-llm

recipe_works

rep

s2s

scene-rec

speaker

speech_ench

sr

thesis

time-series-fin

travel

tts

vad

video_classification

video generation

video_retrieval

voice_conversion

Stars