AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen.
AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following:
# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
python -m pip install 'torch==2.1.0'
# You might need the following before trying to install the packages
python -m pip install setuptools wheel
# Then proceed to one of the following
python -m pip install -U audiocraft # stable release
python -m pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft # bleeding edge
python -m pip install -e . # or if you cloned the repo locally (mandatory if you want to train).
We also recommend having ffmpeg
installed, either through your system or Anaconda:
sudo apt-get install ffmpeg
# Or if you are using Anaconda or Miniconda
conda install "ffmpeg<5" -c conda-forge
At the moment, AudioCraft contains the training code and inference code for:
- MusicGen: A state-of-the-art controllable text-to-music model.
- AudioGen: A state-of-the-art text-to-sound model.
- EnCodec: A state-of-the-art high fidelity neural audio codec.
- Multi Band Diffusion: An EnCodec compatible decoder using diffusion.
- MAGNeT: A state-of-the-art non-autoregressive model for text-to-music and text-to-sound.