-
Hacettepe University
- Ankara/Turkey
-
11:39
(UTC +03:00) - emreozkose.github.io
- @yunusemreozkose
Highlights
- Pro
Lists (29)
Sort Name ascending (A-Z)
action_segmentation
aud_class
basics
data
diar
emo_vc
emotional voice conversionface transform
image-editing
loss
multi-model
nlp
other
rag-kg-llm
recipe_works
rep
s2s
scene-rec
speaker
speech_ench
sr
thesis
time-series-fin
travel
tts
vad
video_classification
video generation
video_retrieval
voice_conversion
Stars
This is the PyTorch implementation of the Universal Source Separation with Weakly labelled Data.
DUSTED: Spoken-Term Discovery using Discrete Speech Units
Official Implementation of TSELM: Target speaker extraction using discrete tokens and language models
ICASSP2025Dynamic Embedding Causal Target Speech Extraction
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis
GANs for time series generation in pytorch
Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Official Implementation for "Only a Matter of Style: Age Transformation Using a Style-Based Regression Model" (SIGGRAPH 2021) https://arxiv.org/abs/2102.02754
A deep learning model to age faces in the wild, currently runs at 60+ fps on GPUs
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
A fast speech-to-any translation model that supports simultaneous decoding and offers 28× speedup.
Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".
Code for ACL 2024 findings paper "CTC-based Non-autoregressive Textless Speech-to-Speech Translation"