This repo is based on Andrej Karpathy's nanoGPT. https://github.com/karpathy/nanoGPT
This work is still in progress. The model is not yet good. I am looking for collaborators. 🤗
So if you are curious about learning more about MIDI and transformers, do not hesitate to review the code. If you have any questions let me know! Preferably via GitHub issues.
The goal is to create a music transformer for MIDI files that has a variational bottleneck. Usecases would be musical interpolation and generation of long sequences.
- The model is able to learn to reconstruct the input sequence.
- So far the bottleneck aka latent space vectors generated by the encoder are not yet used by the decoder.
- This is an early work in progress. The essential pieces are there but the model is not yet good.
- The architecture is a classic encoder-decoder transformer. See "Attention is all you need" for details.
- The variational bottleneck is a fully convolutional network. Instead of a vector, the latent space is a matrix. This is inspired by latent diffusion.
- I am looking for a way to weaken the decoder. If the decoder is too strong the latent space is not taken into account.
- Currently I am experimenting with token dropout to weaken the decoder. VAE beta warmup is also an option, but not implemented yet.
- Currently I am experimenting with 500 midi files from the js-fakes dataset. This allows me to train on a single GPU.
- It is planned to go to a bigger dataset soon.
You can apply token dropout to the decoder and if you like to the encoder as well. This is a form of data augmentation. It is a way to weaken the decoder. If the decoder is too strong the latent space is not taken into account. See source/dataset.py
for details.
It is possible to put a weight term, called beta, on the KL divergence loss. This is called beta warmup. It is a way to weaken or strengthen the decoder. If the decoder is too strong the latent space is not taken into account. See source/trainer.py
for details.
- Prepare to submit Github issues. I am looking for collaborators. 🤗 https://github.com/AI-Guru/musictransformer2023/issues
- Get a decent GPU. I am using an A100, which is clearly overkill. Way smaller GPUs will do just fine.
- Create a dataset by running
python source/preprocess.py
. This will download the js-fakes dataset and prepare it for training. - Better set up Weights and Biases. Create an account and run
wandb login
. This will allow you to track your experiments. https://wandb.ai/ - Run
python runtraining.py
. This will train the model. You can track the progress on Weights and Biases. - Note
python runtraininggrid.py
is an example for grid search.