-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add hipBLAS for windows #135
Conversation
Although it was compiled successfully, I saw that the model was not successfully offloaded to the GPU. |
That's why I'd like to request benchmark results. At the minimum, please provide per-token latencies on your machine for CPU-only and GPU-only modes -- GPU should be significantly lower, if the new backend works. You can use existing script measure_pexplexity.py for measuring. |
ggml_init_cublas: found 1 ROCm devices: Model: RWKV-novel-4-World-7B-20230810-ctx128k-ggml-f16.bin, data: test.txt with 273 tokens, skipped 2 tokens, averages: loss [1.859], perplexity 6.419, latency 447 ms per token |
It was my mistake, I needed to manually offload the context onto the gpu, I just found out. |
Is this result for CPU or GPU? In any case, a second number is needed for comparison. |
This is a GPU test, but it is not offloaded to the GPU correctly. Now by setting |
@saharNooby I think this PR has been completed. |
Support hipBLAS #133