-
Nomono
- Trondheim, Norway
- @iver56
Starred repositories
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
This is the repository for the speech enhancement model SyncFormer
PyTorch native quantization and sparsity for training and inference
VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural network models (and their initializations) to make them easier to…
o1-engineer is a command-line tool designed to assist developers in managing and interacting with their projects efficiently. Leveraging the power of OpenAI's API, this tool provides functionalitie…
Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
High-Fidelity Neural Phonetic Posteriorgrams
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
Less than 100 Kilobytes. Works for Android 5.1 and above
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization
Official repository for the paper "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs"
Strict separation of config from code.
[ECCV 2024 - Oral] HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Apollo audio restoration Colab fork
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
VoiceSplit: Targeted Voice Separation by Speaker-Conditioned Spectrogram
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
TS-BSmamba2: A TWO-STAGE BAND-SPLIT MAMBA-2 NETWORK FOR MUSIC SEPARATION
关于语音信号声源定位DOA估计所用的一些传统算法
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.