-
Notifications
You must be signed in to change notification settings - Fork 964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing all operators on CUDA #463
Comments
I am not sure if you are already doing this, but the CUDA backend currently requires a lot of manual changes to move the tensors to VRAM. The only example of how to do this currently AFAIK is in |
Yes, I have copied the necessary tensors to VRAM. It seems I did overlook that some CUDA operations are asynchronous. I will reprofile using Nsight. |
I performed a simple profile on ggml_cuda_op and found that the time spent on memory copying is several times more than the computation time. This is because not all operators have CUDA versions, so during computation, data is frequently copied between the GPU and CPU, which consumes a lot of time. Here's the data:
The text was updated successfully, but these errors were encountered: