Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Execution speed varies greatly #670

Open
astariul opened this issue Dec 28, 2023 · 0 comments
Open

Execution speed varies greatly #670

astariul opened this issue Dec 28, 2023 · 0 comments

Comments

@astariul
Copy link
Contributor

astariul commented Dec 28, 2023

I use ggml to do a single-step inference (one token) using a small custom GPT-2 model.

Everything is working great, but I noticed a large gap between the fastest run and the slowest run. Most of the time the inference runtime is between 500 μs and 800 μs, but sometimes it takes longer, like 3 000 μs ~ 5 000 μs... (my biggest spike so far was 67 000 μs)

All of these runs are on the same computer, using the same code, with the same inputs of course, and under the same conditions (number of tabs opened, applications running, etc...).
I also set the number of threads to 1 to avoid fluctuations from there.


When investigating, the variation always come from the computations itself (ggml_backend_sched_graph_compute(sched, gf)).
Tokenization, graph creation, etc... always take the same time.

Is it expected ? What's the underlying reason for this huge variability ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant