Skip to content

AI-Guru/musictransformer2023

Repository files navigation

Encoder-Decoder Transformer with Variational Bottleneck.

Acknowledgements.

This repo is based on Andrej Karpathy's nanoGPT. https://github.com/karpathy/nanoGPT

Disclaimer.

This work is still in progress. The model is not yet good. I am looking for collaborators. 🤗

So if you are curious about learning more about MIDI and transformers, do not hesitate to review the code. If you have any questions let me know! Preferably via GitHub issues.

Goal.

The goal is to create a music transformer for MIDI files that has a variational bottleneck. Usecases would be musical interpolation and generation of long sequences.

Current state.

  • The model is able to learn to reconstruct the input sequence.
  • So far the bottleneck aka latent space vectors generated by the encoder are not yet used by the decoder.

Notes.

  • This is an early work in progress. The essential pieces are there but the model is not yet good.
  • The architecture is a classic encoder-decoder transformer. See "Attention is all you need" for details.
  • The variational bottleneck is a fully convolutional network. Instead of a vector, the latent space is a matrix. This is inspired by latent diffusion.
  • I am looking for a way to weaken the decoder. If the decoder is too strong the latent space is not taken into account.
  • Currently I am experimenting with token dropout to weaken the decoder. VAE beta warmup is also an option, but not implemented yet.
  • Currently I am experimenting with 500 midi files from the js-fakes dataset. This allows me to train on a single GPU.
  • It is planned to go to a bigger dataset soon.

Details.

Regularization: Data Augmentation with Token Dropout.

You can apply token dropout to the decoder and if you like to the encoder as well. This is a form of data augmentation. It is a way to weaken the decoder. If the decoder is too strong the latent space is not taken into account. See source/dataset.py for details.

Regularization: VAE Beta Warmup.

It is possible to put a weight term, called beta, on the KL divergence loss. This is called beta warmup. It is a way to weaken or strengthen the decoder. If the decoder is too strong the latent space is not taken into account. See source/trainer.py for details.

Getting started.

  • Prepare to submit Github issues. I am looking for collaborators. 🤗 https://github.com/AI-Guru/musictransformer2023/issues
  • Get a decent GPU. I am using an A100, which is clearly overkill. Way smaller GPUs will do just fine.
  • Create a dataset by running python source/preprocess.py. This will download the js-fakes dataset and prepare it for training.
  • Better set up Weights and Biases. Create an account and run wandb login. This will allow you to track your experiments. https://wandb.ai/
  • Run python runtraining.py. This will train the model. You can track the progress on Weights and Biases.
  • Note python runtraininggrid.py is an example for grid search.

Thanks!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published