Lists (11)
Sort Name ascending (A-Z)
Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Official release of InternLM2.5 base and chat models. 1M context support
🔊 Text-Prompted Generative Audio Model
SoftVC VITS Singing Voice Conversion
vits2 backbone with multilingual-bert
Hydra is a framework for elegantly configuring complex applications
Tracking the progress in end-to-end speech translation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
Foundational Models for State-of-the-Art Speech and Text Translation
formiel / fairseq
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
搜集、整理、发布 中文 自然语言处理 语料/数据集,与 有志之士 共同 促进 中文 自然语言处理 的 发展。
Pushing the Limits of Zero-shot End-to-End Speech Translation
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
chinese speech pretrained models
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
This is an implementation of paper "End-to-end Speech Translation via Cross-modal Progressive Training" (Interspeech2021)
A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
Fast and memory-efficient exact attention
dataset for lightly supervised training using the librivox audio book recordings. https://librivox.org/.
This repository contains demos I made with the Transformers library by HuggingFace.
Notebooks using the Hugging Face libraries 🤗
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image