-
Notifications
You must be signed in to change notification settings - Fork 940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT Benchmarks #2
Comments
In all benchmarks I generated 200 tokens, starting with a prompt consisting of a single token. My implementation does support KV caching - I used the term "memory": Lines 259 to 276 in e2f39f4
Here we store new values into the memory: Lines 448 to 455 in e2f39f4
And here we use the cached data: Lines 466 to 473 in e2f39f4
Lines 507 to 514 in e2f39f4
Even with caching, the processing time increases with more and more tokens. The benchmarks are the average time across generating the 200 tokens. |
Thanks for the insight. Also, very impressive work. |
* vvhg-code-infill (ggerganov#1) * infill in separate example (ggerganov#2) * reverted changes to main and added infill example * cleanup * naming improvement * make : add missing blank line * fix missing semicolon * brought infill up to current main code * cleanup --------- Co-authored-by: Cebtenzzre <[email protected]>
GPT models without KV cache have to recalculate values and thus time to compute grows exponentially given a longer input.
Thus, for your benchmarks, how many tokens were generated, and with how many total? Does this support a caching system?
The text was updated successfully, but these errors were encountered: