AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Speech

Task	Supported Foundation Models	Status
Text-to-Speech	FastSpeech, SyntaSpeech, VITS	Yes (WIP)
Style Transfer	GenerSpeech	Yes
Speech Recognition	whisper, Conformer	Yes
Speech Enhancement	ConvTasNet	Yes (WIP)
Speech Separation	TF-GridNet	Yes (WIP)
Speech Translation	Multi-decoder	WIP
Mono-to-Binaural	NeuralWarp	Yes

Sing

Task	Supported Foundation Models	Status
Text-to-Sing	DiffSinger, VISinger	Yes (WIP)

Audio

Task	Supported Foundation Models	Status
Text-to-Audio	Make-An-Audio	Yes
Audio Inpainting	Make-An-Audio	Yes
Image-to-Audio	Make-An-Audio	Yes
Sound Detection	Audio-transformer	Yes
Target Sound Detection	TSDNet	Yes
Sound Extraction	LASSNet	Yes

Talking Head

Task	Supported Foundation Models	Status
Talking Head Synthesis	GeneFace	Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
NeuralSeq		NeuralSeq
assets		assets
audio_detection		audio_detection
audio_to_text		audio_to_text
mono2binaural/src		mono2binaural/src
sound_extraction		sound_extraction
text_to_audio/MakeAnAudio		text_to_audio/MakeAnAudio
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio-chatgpt.py		audio-chatgpt.py
download.sh		download.sh
requirements.txt		requirements.txt
run.md		run.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Get Started

Capabilities

Speech

Sing

Audio

Talking Head

Acknowledgement

About

Releases

Packages

Languages

License

ggs555/AudioGPT

Folders and files

Latest commit

History

Repository files navigation

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Get Started

Capabilities

Speech

Sing

Audio

Talking Head

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages