You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use ggml to do a single-step inference (one token) using a small custom GPT-2 model.
Everything is working great, but I noticed a large gap between the fastest run and the slowest run. Most of the time the inference runtime is between 500 μs and 800 μs, but sometimes it takes longer, like 3 000 μs ~ 5 000 μs... (my biggest spike so far was 67 000 μs)
All of these runs are on the same computer, using the same code, with the same inputs of course, and under the same conditions (number of tabs opened, applications running, etc...).
I also set the number of threads to 1 to avoid fluctuations from there.
When investigating, the variation always come from the computations itself (ggml_backend_sched_graph_compute(sched, gf)).
Tokenization, graph creation, etc... always take the same time.
Is it expected ? What's the underlying reason for this huge variability ?
The text was updated successfully, but these errors were encountered:
I use ggml to do a single-step inference (one token) using a small custom GPT-2 model.
Everything is working great, but I noticed a large gap between the fastest run and the slowest run. Most of the time the inference runtime is between 500 μs and 800 μs, but sometimes it takes longer, like 3 000 μs ~ 5 000 μs... (my biggest spike so far was 67 000 μs)
All of these runs are on the same computer, using the same code, with the same inputs of course, and under the same conditions (number of tabs opened, applications running, etc...).
I also set the number of threads to 1 to avoid fluctuations from there.
When investigating, the variation always come from the computations itself (
ggml_backend_sched_graph_compute(sched, gf)
).Tokenization, graph creation, etc... always take the same time.
Is it expected ? What's the underlying reason for this huge variability ?
The text was updated successfully, but these errors were encountered: