FEAT: Complete implementation of `GGML_OP_CONV_1D` #523

PABannier · 2023-09-15T12:49:36Z

Currently, the 1d convolution is only implemented for half padding and stride 1 and 2. Yet, the 1d convolution is a crucial operation, needed for instance in bark.cpp and encodec.cpp .

This PR completes the implementation of the 1d convolution (for f32 and f16 src types). It also updates the computation of the size needed for the work buffer.

ggerganov · 2023-09-15T17:39:17Z

Thanks!

I just tested this by merging master into this branch and running the whisper example.
The transcription results are now wrong.

Whisper uses the ggml_conv_1d operator:

ggml/examples/whisper/whisper.cpp

Lines 1542 to 1561 in 7ddb66d

 // convolution + gelu 

 { 

 cur = ggml_conv_1d_ph(ctx0, model.e_conv_1_w, mel, 1, 1); 

 cur = ggml_add(ctx0, 

 ggml_repeat(ctx0, 

 model.e_conv_1_b, 

 cur), 

 cur); 

 cur = ggml_gelu(ctx0, cur); 

 cur = ggml_conv_1d_ph(ctx0, model.e_conv_2_w, cur, 2, 1); 

 cur = ggml_add(ctx0, 

 ggml_repeat(ctx0, 

 model.e_conv_2_b, 

 cur), 

 cur); 

 cur = ggml_gelu(ctx0, cur); 

 }

Repro:

./bin/whisper -m ../../whisper.cpp/models/ggml-small.en.bin -f ../../whisper.cpp/samples/gb0.wav

ggerganov · 2023-09-15T18:00:45Z

If you rebase to latest master you can simply run the following command in ggml root directory:

bash ./ci/run.sh ./tmp/results ./tmp/mnt

It will run the CI locally and at the end of the run is the Whisper test.
If all works correctly, you should see output such as this:

https://github.com/ggml-org/ci/tree/results/ggml/a1/f6ca42699228b0b4223240a2cf507732a1e716/ggml-0-x86-cpu-low-perf#whisper

whisper_init_from_file_no_state: loading model from '../models-mnt/whisper//ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   14.10 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB

system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing '../models-mnt/whisper//jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =    87.16 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    21.10 ms
whisper_print_timings:   sample time =    16.29 ms /    27 runs (    0.60 ms per run)
whisper_print_timings:   encode time =  1974.43 ms /     1 runs ( 1974.43 ms per run)
whisper_print_timings:   decode time =   126.99 ms /    27 runs (    4.70 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  2262.84 ms

PABannier · 2023-09-15T20:51:29Z

@ggerganov Thanks for pushing a way to test conv 1d.

I took the code from ggml_conv_2d and essentially account for one spatial dimension less for the kernel and the input. The test is still not passing. Is there any documentations available for how the convolution 2d is implemented in ggml?

whisper_init_from_file_no_state: loading model from '../models-mnt/whisper//ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   14.10 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '../models-mnt/whisper//jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:17.000]   [Music]

whisper_print_timings:     load time =   161.08 ms
whisper_print_timings:     fallbacks =   2 p /   0 h
whisper_print_timings:      mel time =    17.59 ms
whisper_print_timings:   sample time =   449.60 ms /   455 runs (    0.99 ms per run)
whisper_print_timings:   encode time =  3536.56 ms /     1 runs ( 3536.56 ms per run)
whisper_print_timings:   decode time =  1822.40 ms /   453 runs (    4.02 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  6079.51 ms

PABannier · 2023-09-16T16:55:24Z

Works for me now! @ggerganov
Inspired from the fast Conv2D implementation in #483

whisper_init_from_file_no_state: loading model from '../models-mnt/whisper//ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.66 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   15.48 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB

system_info: n_threads = 4 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing '../models-mnt/whisper//jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]   And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings:     load time =   162.24 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    17.83 ms
whisper_print_timings:   sample time =    19.15 ms /    27 runs (    0.71 ms per run)
whisper_print_timings:   encode time =  1666.83 ms /     1 runs ( 1666.83 ms per run)
whisper_print_timings:   decode time =   117.93 ms /    27 runs (    4.37 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  2076.42 ms

ggerganov · 2023-09-16T19:07:16Z

Awesome - will take a look in the next days

PABannier · 2023-09-27T08:46:03Z

@ggerganov Can somebody have a look please? I need it to complete bark.cpp and the implementation of other TTS models :) This would greatly help me. Thanks!

ggerganov · 2023-09-27T09:16:48Z

@PABannier Yes, sorry for the delay - was travelling for the past week. I'm back now and will catch up with everything today and tomorrow

PABannier added 2 commits September 15, 2023 14:44

implementation

88fd064

fix wrong call to function

b8ce3a4

PABannier added 2 commits September 15, 2023 21:50

Merge branch 'master' of https://github.com/ggerganov/ggml into conv_1d

f991cbd

matching closely ggml_conv_2d

b6b351d

PABannier added 2 commits September 16, 2023 09:44

optimized conv_1d with stages 0 and 1

75c9f32

working implementation

8e31c6e

ggerganov approved these changes Sep 28, 2023

View reviewed changes

ggerganov merged commit a706d68 into ggerganov:master Sep 28, 2023
4 checks passed

PABannier mentioned this pull request Oct 9, 2023

make ggml_conv_2d faster #483

Merged

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023

[main] fix infinite generation (-n == -1) (ggerganov#523)

7a87d31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Complete implementation of `GGML_OP_CONV_1D` #523

FEAT: Complete implementation of `GGML_OP_CONV_1D` #523

PABannier commented Sep 15, 2023

ggerganov commented Sep 15, 2023

ggerganov commented Sep 15, 2023

PABannier commented Sep 15, 2023 •

edited

Loading

PABannier commented Sep 16, 2023

ggerganov commented Sep 16, 2023

PABannier commented Sep 27, 2023

ggerganov commented Sep 27, 2023

FEAT: Complete implementation of GGML_OP_CONV_1D #523

FEAT: Complete implementation of GGML_OP_CONV_1D #523

Conversation

PABannier commented Sep 15, 2023

ggerganov commented Sep 15, 2023

ggerganov commented Sep 15, 2023

PABannier commented Sep 15, 2023 • edited Loading

PABannier commented Sep 16, 2023

ggerganov commented Sep 16, 2023

PABannier commented Sep 27, 2023

ggerganov commented Sep 27, 2023

FEAT: Complete implementation of `GGML_OP_CONV_1D` #523

FEAT: Complete implementation of `GGML_OP_CONV_1D` #523

PABannier commented Sep 15, 2023 •

edited

Loading