Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: use encodec's latest version as a submodule #124

Merged
merged 69 commits into from
Feb 13, 2024
Merged
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
2302881
rm encodec
PABannier Oct 26, 2023
450a606
add git submodules
PABannier Oct 26, 2023
4ece5fa
removed bark util
PABannier Oct 26, 2023
9c9f7e8
updated CMakeLists
PABannier Oct 26, 2023
e2b350a
rm build scripts
PABannier Oct 26, 2023
d491cc7
mv dr_wav in examples
PABannier Oct 26, 2023
38c2e49
common cpp
PABannier Oct 26, 2023
6b32b3b
moved def constants
PABannier Oct 26, 2023
ab9b528
text encoder loaded with the latest ggml API
PABannier Oct 27, 2023
6a9b50a
pulled ggml upstream
PABannier Oct 27, 2023
efbdd56
temporarily removed subdirectory encodec.cpp
PABannier Oct 27, 2023
753d5cf
clean forward pass text encoder
PABannier Oct 27, 2023
e4e712f
compiling
PABannier Oct 27, 2023
d8fc378
fix issue definition
PABannier Oct 27, 2023
b258c08
clean
PABannier Oct 28, 2023
6642e75
remove codec parsing functions
PABannier Oct 28, 2023
33d186e
kinda works
PABannier Oct 28, 2023
83a21ec
bias is stored in hparams
PABannier Oct 28, 2023
6cad888
working text encoder
PABannier Oct 28, 2023
242e7c5
cln tests
PABannier Oct 28, 2023
c1d0edd
coarse working?
PABannier Oct 28, 2023
94cd5e2
override bias
PABannier Oct 29, 2023
acf9dfa
working fine encoder
PABannier Oct 29, 2023
c1def75
rename quantize.cpp into main.cpp
PABannier Oct 29, 2023
cfaa59c
included quantize as a target
PABannier Oct 29, 2023
05ef89d
exposed quantization function
PABannier Oct 29, 2023
6172381
minor
PABannier Oct 29, 2023
6d0db93
update CIs
PABannier Oct 29, 2023
a978908
updated CIs
PABannier Oct 29, 2023
8ae7dc5
passing tokenizer test
PABannier Oct 29, 2023
7ad8cd5
Merge branch 'main' of https://github.com/PABannier/bark.cpp into enc…
PABannier Oct 29, 2023
7c2ae84
fast text encoder
PABannier Dec 11, 2023
d3971c2
Merge branch 'main' of https://github.com/PABannier/bark.cpp into enc…
PABannier Dec 11, 2023
5874a87
`bark.cpp` -> `bark`
PABannier Dec 11, 2023
5312577
server abides by latest API
PABannier Dec 12, 2023
e7b7d75
rm fast-text-encoder example
PABannier Dec 12, 2023
2aaf7b2
pass `-O3` release flag
PABannier Dec 12, 2023
79ed551
rm fast_text_encoder from CMakeLists
PABannier Dec 12, 2023
4f72d56
restructured
PABannier Dec 12, 2023
f13498a
CMakeLists arranged
PABannier Dec 13, 2023
f517570
update CIs
PABannier Dec 13, 2023
b8bdd76
add encodec.cpp in the loop
PABannier Dec 14, 2023
5319d26
add verbosity level
PABannier Dec 15, 2023
11c3f9a
Fix CIs (#128)
AlexHayton Dec 30, 2023
da3cc56
Merge branch 'encodec_as_submodule' of https://github.com/PABannier/b…
PABannier Jan 2, 2024
07a322c
fix coarse encoder internal pass
PABannier Jan 3, 2024
3002698
`VerbosityLevel` -> `bark_verbosity_level`
PABannier Jan 5, 2024
747345c
updated examples
PABannier Jan 5, 2024
a3e3e92
populated time per token
PABannier Jan 5, 2024
19e1683
remove whitespace
PABannier Jan 5, 2024
fa6975c
BarkProgressBar implemented
PABannier Jan 6, 2024
b9e2109
verbosity level controlled for cleaner output
PABannier Jan 6, 2024
4401975
removed params as macros and moved them into default constructor
PABannier Jan 6, 2024
38846ec
updated README
PABannier Jan 6, 2024
59d5352
removed useless `n_predict` in params
PABannier Jan 6, 2024
07e92de
removed old tests
PABannier Jan 6, 2024
ec677fb
fix wrong return type, quantization works again
PABannier Jan 6, 2024
035ef16
Added Metal and CUDA backend
PABannier Jan 6, 2024
6e4ac9a
updated docs
PABannier Jan 6, 2024
ac327a9
cosmit
PABannier Jan 6, 2024
d347134
rm submodule
PABannier Jan 7, 2024
d7e9661
added encodec submodule
PABannier Jan 7, 2024
1fbe29d
remove mem_per_token
PABannier Jan 7, 2024
b3d9179
more verbose errors
PABannier Jan 7, 2024
94fea82
clean
PABannier Jan 7, 2024
bec8547
reset allocr to reduce memory footprint
PABannier Jan 7, 2024
df7c22a
add tests
PABannier Jan 7, 2024
6fbc184
expose forward passes
PABannier Jan 7, 2024
87a102b
enhanced README.md
PABannier Feb 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updated README
  • Loading branch information
PABannier committed Jan 6, 2024
commit 38846ecfba9da6c81a575da2a215fd798e044d7f
142 changes: 41 additions & 101 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,10 @@

Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++.

**Disclaimer: there remains bug in the inference code, bark is able to generate audio for some prompts or some seeds,
but it does not work for most prompts. The current effort of the community is to fix those bugs, in order to release
v0.0.2**.

## Description

The main goal of `bark.cpp` is to synthesize audio from a textual input with the [Bark](https://github.com/suno-ai/bark) model in efficiently using only CPU.
With `bark.cpp`, our goal is to bring **real-time realistic** text-to-speech generation to the community.
Currently, we are focused on porting the [Bark](https://github.com/suno-ai/bark) model in C++.

- [X] Plain C/C++ implementation without dependencies
- [X] AVX, AVX2 and AVX512 for x86 architectures
Expand All @@ -42,113 +39,56 @@ Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd

---

Here are typical audio pieces generated by `bark.cpp`:
Here is a typical run using `bark.cpp`:

https://github.com/PABannier/bark.cpp/assets/12958149/f9f240fd-975f-4d69-9bb3-b295a61daaff
```java
make -j && ./main -p "This is an audio generated by bark.cpp"

https://github.com/PABannier/bark.cpp/assets/12958149/c0caadfd-bed9-4a48-8c17-3215963facc1
__ __
/ /_ ____ ______/ /__ _________ ____
/ __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \
/ /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ /
/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/
/_/ /_/

Here is a typical run using Bark:

```java
make -j && ./main -p "this is an audio"
I bark.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 14.0.0 (clang-1400.0.29.202)
I CXX: Apple clang version 14.0.0 (clang-1400.0.29.202)

bark_model_load: loading model from './ggml_weights'
bark_model_load: reading bark text model
gpt_model_load: n_in_vocab = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1701.69 MB
bark_model_load: reading bark vocab

bark_model_load: reading bark coarse model
gpt_model_load: n_in_vocab = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 1
gpt_model_load: n_wtes = 1
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1250.69 MB

bark_model_load: reading bark fine model
gpt_model_load: n_in_vocab = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size = 1024
gpt_model_load: n_embd = 1024
gpt_model_load: n_head = 16
gpt_model_load: n_layer = 24
gpt_model_load: n_lm_heads = 7
gpt_model_load: n_wtes = 8
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size = 192.00 MB, n_mem = 24576
gpt_model_load: model size = 1218.26 MB

bark_model_load: reading bark codec model
encodec_model_load: model size = 44.32 MB

bark_model_load: total model size = 74.64 MB

bark_generate_audio: prompt: 'this is an audio'
bark_generate_audio: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595
bark_forward_text_encoder: ...........................................................................................................

bark_forward_text_encoder: mem per token = 4.80 MB
bark_forward_text_encoder: sample time = 7.91 ms
bark_forward_text_encoder: predict time = 2779.49 ms / 7.62 ms per token
bark_forward_text_encoder: total time = 2829.35 ms

bark_forward_coarse_encoder: .................................................................................................................................................................
..................................................................................................................................................................

bark_forward_coarse_encoder: mem per token = 8.51 MB
bark_forward_coarse_encoder: sample time = 3.08 ms
bark_forward_coarse_encoder: predict time = 10997.70 ms / 33.94 ms per token
bark_forward_coarse_encoder: total time = 11036.88 ms

bark_forward_fine_encoder: .....

bark_forward_fine_encoder: mem per token = 5.11 MB
bark_forward_fine_encoder: sample time = 39.85 ms
bark_forward_fine_encoder: predict time = 19773.94 ms
bark_forward_fine_encoder: total time = 19873.72 ms



bark_forward_encodec: mem per token = 760209 bytes
bark_forward_encodec: predict time = 528.46 ms / 528.46 ms per token
bark_forward_encodec: total time = 663.63 ms
bark_tokenize_input: prompt: 'this is a dog barking.'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 10217 27883 28169 25677 10167 129595

Number of frames written = 51840.
Generating semantic tokens: [========> ] (17%)

bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 9.90 ms / 138 tokens
bark_print_statistics: predict time = 3163.78 ms / 22.92 ms per token
bark_print_statistics: total time = 3188.37 ms

Generating coarse tokens: [==================================================>] (100%)

bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 3.96 ms / 410 tokens
bark_print_statistics: predict time = 14303.32 ms / 34.89 ms per token
bark_print_statistics: total time = 14315.52 ms

Generating fine tokens: [==================================================>] (100%)

bark_print_statistics: mem per token = 0.00 MB
bark_print_statistics: sample time = 41.93 ms / 6144 tokens
bark_print_statistics: predict time = 15234.38 ms / 2.48 ms per token
bark_print_statistics: total time = 15282.15 ms

Number of frames written = 51840.

main: load time = 1436.36 ms
main: eval time = 34520.53 ms
main: total time = 35956.92 ms
main: total time = 32786.04 ms
```

Here are typical audio pieces generated by `bark.cpp`:

https://github.com/PABannier/bark.cpp/assets/12958149/f9f240fd-975f-4d69-9bb3-b295a61daaff

https://github.com/PABannier/bark.cpp/assets/12958149/c0caadfd-bed9-4a48-8c17-3215963facc1

## Usage

Here are the steps for the bark model.
Expand Down