Skip to content

Commit

Permalink
update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
haoheliu committed Aug 14, 2023
1 parent d7d120c commit 1740a57
Showing 1 changed file with 18 additions and 9 deletions.
27 changes: 18 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@

This repo currently support Text-to-Audio (including Music) and Text-to-Speech Generation.

* [TODO](#todo)
* [Web APP](#web-app)
* [Commandline Usage](#commandline-usage)
+ [Installation](#installation)
+ [Run the model in commandline](#run-the-model-in-commandline)
* [Random Seed Matters](#random-seed-matters)
* [Pretrained Models](#pretrained-models)
* [Other options](#other-options)
* [Cite this work](#cite-this-work)

<hr>

## TODO
Expand Down Expand Up @@ -39,7 +49,7 @@ conda create -n audioldm python=3.8; conda activate audioldm
pip3 install git+https://github.com/haoheliu/AudioLDM2.git
```

Please make sure you have installed [espeak](https://espeak.sourceforge.net/download.html). On linux you can do it by
If you plan to play around with text-to-speech generation. Please also make sure you have installed [espeak](https://espeak.sourceforge.net/download.html). On linux you can do it by
```shell
sudo apt-get install espeak
```
Expand All @@ -61,9 +71,7 @@ audioldm2 -tl batch.lst

```shell
audioldm2 -t "A female reporter is speaking full of emotion" --transciption "Wish you have a good day"
```

```shell
audioldm2 -t "A female reporter is speaking" --transciption "Wish you have a good day"
```

Expand All @@ -88,12 +96,13 @@ audioldm2 --model_name "audioldm2-full-large-1150k" --device cuda -t "Musical co
audioldm2 --model_name "audioldm2-full-large-1150k" --device mps -t "Musical constellations twinkling in the night sky, forming a cosmic melody."
```

We have three checkpoints you can choose for now:
1. **audioldm2-full** (default): This checkpoint can perform both sound effect and music generation.
2. **audioldm2-full-large-1150k**: This checkpoint is the larger version of audioldm2-full.
3. **audioldm2-music-665k**: This checkpoint is specialized on music generation.
4. **audioldm2-speech-gigaspeech**: Text-to-Speech checkpoint that is trained on GigaSpeech Dataset.
5. **audioldm2-speech-ljspeech**: Text-to-Speech checkpoint that is trained on LJSpeech Dataset.
We have five checkpoints you can choose:

1. **audioldm2-full** (default): Generate both sound effect and music generation.
2. **audioldm2-full-large-1150k**: Larger version of audioldm2-full.
3. **audioldm2-music-665k**: Music generation.
4. **audioldm2-speech-gigaspeech** (default for TTS): Text-to-Speech, trained on GigaSpeech Dataset.
5. **audioldm2-speech-ljspeech**: Text-to-Speech, trained on LJSpeech Dataset.

We currently support 3 devices:
- cpu
Expand Down

0 comments on commit 1740a57

Please sign in to comment.