Roadmap / encodec.cpp / ggml
Inference of SunoAI's bark model in pure C/C++.
With bark.cpp
, my goal is to bring real-time realistic multilingual text-to-speech generation to the community. Currently, I am focused on porting the Bark model in C++.
- Plain C/C++ implementation without dependencies
- AVX, AVX2 and AVX512 for x86 architectures
- CPU and GPU compatible backends
- Mixed F16 / F32 precision
- 4-bit, 5-bit and 8-bit integer quantization
- Metal and CUDA backends
The original implementation of bark.cpp
is the bark's 24Khz English model. We expect to support multiple encoders in the future (see this and this), as well as music generation model (see this). This project is for educational purposes.
Demo on Google Colab (#95)
Here is a typical run using bark.cpp
:
make -j && ./main -p "This is an audio generated by bark.cpp"
__ __
/ /_ ____ ______/ /__ _________ ____
/ __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/
/_/ /_/
bark_tokenize_input: prompt: 'this is a dog barking.'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 10217 27883 28169 25677 10167 129595
Generating semantic tokens: [========> ] (17%)
bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 9.90 ms / 138 tokens
bark_print_statistics: predict time = 3163.78 ms / 22.92 ms per token
bark_print_statistics: total time = 3188.37 ms
Generating coarse tokens: [==================================================>] (100%)
bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 3.96 ms / 410 tokens
bark_print_statistics: predict time = 14303.32 ms / 34.89 ms per token
bark_print_statistics: total time = 14315.52 ms
Generating fine tokens: [==================================================>] (100%)
bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 41.93 ms / 6144 tokens
bark_print_statistics: predict time = 15234.38 ms / 2.48 ms per token
bark_print_statistics: total time = 15282.15 ms
Number of frames written = 51840.
main: load time = 1436.36 ms
main: eval time = 34520.53 ms
main: total time = 32786.04 ms
Here are typical audio pieces generated by bark.cpp
:
audio1.mp4
audio2.mp4
Here are the steps to use Bark.cpp
git clone --recursive https://github.com/PABannier/bark.cpp.git
cd bark.cpp
git submodule update --init --recursive
In order to build bark.cpp you must use CMake
:
mkdir bark/build
cd bark/build
cmake ..
cmake --build . --config Release
# install Python dependencies
python3 -m pip install -r bark/requirements.txt
# obtain the original bark and encodec weights and place them in ./models
python3 bark/download_weights.py --download-dir ./models
# convert the model to ggml format
python3 bark/convert.py \
--dir-model ./models \
--vocab-path ./ggml_weights/ \
--out-dir ./ggml_weights/
# convert the encodec model to ggml format
python ./encodec.cpp/convert.py \
--dir-model ./models/ \
--out-dir ./ggml_weights/ \
--use-f16
# run the inference
./bark/build/examples/main/main -m ./ggml_weights/ -p "this is an audio"
Weights can be quantized using the following strategy: q4_0
, q4_1
, q5_0
, q5_1
, q8_0
.
Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models.
mkdir ggml_weights_q4
cp ggml_weights/*vocab* ggml_weights_q4
./bark/build/examples/quantize/quantize ./ggml_weights/ggml_weights_text.bin ./ggml_weights_q4/ggml_weights_text.bin q4_0
./bark/build/examples/quantize/quantize ./ggml_weights/ggml_weights_coarse.bin ./ggml_weights_q4/ggml_weights_coarse.bin q4_0
./bark/build/examples/quantize/quantize ./ggml_weights/ggml_weights_fine.bin ./ggml_weights_q4/ggml_weights_fine.bin q4_0
- Bark
- Encodec
- GPT-3
bark.cpp
is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be
- bug report: you may encounter a bug while using
bark.cpp
. Don't hesitate to report it on the issue section. - feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions.
- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you.
- Avoid adding third-party dependencies, extra files, extra headers, etc.
- Always consider cross-compatibility with other operating systems and architectures