Execution speed varies greatly #670

astariul · 2023-12-28T06:33:57Z

I use ggml to do a single-step inference (one token) using a small custom GPT-2 model.

Everything is working great, but I noticed a large gap between the fastest run and the slowest run. Most of the time the inference runtime is between 500 μs and 800 μs, but sometimes it takes longer, like 3 000 μs ~ 5 000 μs... (my biggest spike so far was 67 000 μs)

All of these runs are on the same computer, using the same code, with the same inputs of course, and under the same conditions (number of tabs opened, applications running, etc...).
I also set the number of threads to 1 to avoid fluctuations from there.

When investigating, the variation always come from the computations itself (ggml_backend_sched_graph_compute(sched, gf)).
Tokenization, graph creation, etc... always take the same time.

Is it expected ? What's the underlying reason for this huge variability ?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution speed varies greatly #670

Execution speed varies greatly #670

astariul commented Dec 28, 2023 •

edited

Loading

Execution speed varies greatly #670

Execution speed varies greatly #670

Comments

astariul commented Dec 28, 2023 • edited Loading

astariul commented Dec 28, 2023 •

edited

Loading