Name	Name	Last commit message	Last commit date
Latest commit History 25 Commits
audioldm2	audioldm2
bin	bin
tests	tests
.gitignore	.gitignore
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
app.py	app.py
batch.lst	batch.lst
bg.png	bg.png
requirements.txt	requirements.txt
setup.py	setup.py
share_btn.py	share_btn.py

AudioLDM 2

This repo currently support Text-to-Audio Generation (including Music)

TODO

Add the text-to-speech checkpoint
Add the text-to-audio checkpoint that does not use FLAN-T5 Cross Attention
Open-source the AudioLDM 1 & 2 training code.
Support the generation of longer audio (> 10s)
Optimizing the inference speed of the model.
Integration with the Diffusers library

Web APP

Prepare running environment

conda create -n audioldm python=3.8; conda activate audioldm
pip3 install git+https://github.com/haoheliu/AudioLDM2.git
git clone https://github.com/haoheliu/AudioLDM2; cd AudioLDM2

Start the web application (powered by Gradio)

python3 app.py

A link will be printed out. Click the link to open the browser and play.

Commandline Usage

Prepare running environment

# Optional
conda create -n audioldm python=3.8; conda activate audioldm
# Install AudioLDM
pip3 install git+https://github.com/haoheliu/AudioLDM2.git

Generate based on a text prompt

audioldm2 -t "Musical constellations twinkling in the night sky, forming a cosmic melody."

Generate based on a list of text

audioldm2 -tl batch.lst

Random Seed Matters

Sometimes model may not perform well (sounds wired or low quality) when changing into a different hardware. In this case, please adjust the random seed and find the optimal one for your hardware.

audioldm2 --seed 1234 -t "Musical constellations twinkling in the night sky, forming a cosmic melody."

Pretrained Models

You can choose model checkpoint by setting up "model_name":

# CUDA
audioldm2 --model_name "audioldm2-full-large-650k" --device cuda -t "Musical constellations twinkling in the night sky, forming a cosmic melody."

# MPS
audioldm2 --model_name "audioldm2-full-large-650k" --device mps -t "Musical constellations twinkling in the night sky, forming a cosmic melody."

We have three checkpoints you can choose for now:

audioldm2-full (default): This checkpoint can perform both sound effect and music generation.
audioldm2-music-665k: This checkpoint is specialized on music generation.
audioldm2-full-large-650k: This checkpoint is the larger version of audioldm2-full.

We currently support 3 devices:

cpu
cuda
mps ( Notice that the computation requires about 20GB of RAM. )

Other options

  usage: audioldm2 [-h] [-t TEXT] [-tl TEXT_LIST] [-s SAVE_PATH]
                 [--model_name {audioldm2-full,audioldm2-music-665k,audioldm2-full-large-650k}] [-d DEVICE]
                 [-b BATCHSIZE] [--ddim_steps DDIM_STEPS] [-gs GUIDANCE_SCALE] [-n N_CANDIDATE_GEN_PER_TEXT]
                 [--seed SEED]

  optional arguments:
    -h, --help            show this help message and exit
    -t TEXT, --text TEXT  Text prompt to the model for audio generation
    -tl TEXT_LIST, --text_list TEXT_LIST
                          A file that contains text prompt to the model for audio generation
    -s SAVE_PATH, --save_path SAVE_PATH
                          The path to save model output
    --model_name {audioldm2-full,audioldm2-music-665k,audioldm2-full-large-650k}
                          The checkpoint you gonna use
    -d DEVICE, --device DEVICE
                          The device for computation. If not specified, the script will automatically choose the device based on your environment. [cpu, cuda, mps, auto]
    -b BATCHSIZE, --batchsize BATCHSIZE
                          Generate how many samples at the same time
    --ddim_steps DDIM_STEPS
                          The sampling step for DDIM
    -gs GUIDANCE_SCALE, --guidance_scale GUIDANCE_SCALE
                          Guidance scale (Large => better quality and relavancy to text; Small => better diversity)
    -n N_CANDIDATE_GEN_PER_TEXT, --n_candidate_gen_per_text N_CANDIDATE_GEN_PER_TEXT
                          Automatic quality control. This number control the number of candidates (e.g., generate three audios and choose the best to show you). A Larger value usually lead to better quality with
                          heavier computation
    --seed SEED           Change this value (any integer number) will lead to a different generation result.

Cite this work

If you found this tool useful, please consider citing

    AudioLDM 2 paper coming soon

@article{liu2023audioldm,
  title={AudioLDM: Text-to-Audio Generation with Latent Diffusion Models},
  author={Liu, Haohe and Chen, Zehua and Yuan, Yi and Mei, Xinhao and Liu, Xubo and Mandic, Danilo and Wang, Wenwu and Plumbley, Mark D},
  journal={arXiv preprint arXiv:2301.12503},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioLDM 2

TODO

Web APP

Commandline Usage

Random Seed Matters

Pretrained Models

Other options

Cite this work

About

Releases

Packages

Languages

License

GPU-Net/AudioLDM2

Folders and files

Latest commit

History

Repository files navigation

AudioLDM 2

TODO

Web APP

Commandline Usage

Random Seed Matters

Pretrained Models

Other options

Cite this work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages