GitHub - dsesclei/generative-beatsaber: Audio encoder + LLM for rhythm game map generation

Audio encoder + LLM for rhythm game map generation

Architecture

Preprocessing

Songs are divided into segments roughly 10 seconds long, each containing a number of beats (which varies per song by BPM.) The raw audio for these songs is then processed through a neural audio codec (Descript Audio Codec), providing the initial audio embeddings.

Training

These codec embeddings are further processed by a Conformer followed by a Perceiver (BLIP-3), the latter of which converts this variable length segment into a fixed number of LLM embeddings that can be used in place of tokens.

Llama-3-8b is finetuned on these songs with their tokenized beatmaps using LoRA. The prompt is composed of the full list of audio embeddings, a header, and then the embeddings for each segment interleaved with the segment's note tokens. Phrased in code:

all_audio_embeddings + header_tokens + audio_embeddings[0] + segment_tokens[0] + audio_embeddings[1] + segment_tokens[1] ...

Or in tokens:

AUDIO_0 AUDIO_1 ... AUDIO_N <header> Difficulty: expert-plus | BPM level: 3 | Rating: 9 | walls </header> AUDIO_0 [red middle far-left down] [blue bottom left down-left] [12% blue bottom far-right right] [25% blue bottom left left] ... end AUDIO_1 start [red bottom right right] [12% red middle far-left up-left] ...

Notes are in the format [percent along segment, color, row, col, cut direction].

Code is also present for experimenting with spectrograms instead of the codec, and a Q-former (BLIP-2) in place of the Perceiver.

Similar projects and papers

InfernoSaber, which separates responsibilites out into individual convolutional/FFN models
Beat Sage, based on the paper Dance Dance Convolution
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity - helpful overview of recent audio+LLM papers in Table 1
Connecting Speech Encoder and Large Language Model for ASR - frozen encoder + trainable Q-former + frozen LLM for ASR

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
model		model
.gitignore		.gitignore
callbacks.py		callbacks.py
config.py		config.py
infer.py		infer.py
optimizer.py		optimizer.py
readme.md		readme.md
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio encoder + LLM for rhythm game map generation

Architecture

Preprocessing

Training

Similar projects and papers

About

Languages

dsesclei/generative-beatsaber

Folders and files

Latest commit

History

Repository files navigation

Audio encoder + LLM for rhythm game map generation

Architecture

Preprocessing

Training

Similar projects and papers

About

Resources

Stars

Watchers

Forks

Languages