-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduction of CUDA Graphs to LLama.cpp #6766
Merged
Merged
Commits on Apr 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for cec409a - Browse repository at this point
Copy the full SHA cec409aView commit details
Commits on Apr 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c8dd0e7 - Browse repository at this point
Copy the full SHA c8dd0e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 800f4fe - Browse repository at this point
Copy the full SHA 800f4feView commit details -
Configuration menu - View commit details
-
Copy full SHA for c2691d9 - Browse repository at this point
Copy the full SHA c2691d9View commit details
Commits on Apr 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for df4719e - Browse repository at this point
Copy the full SHA df4719eView commit details
Commits on Apr 24, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c3d4ead - Browse repository at this point
Copy the full SHA c3d4eadView commit details -
Configuration menu - View commit details
-
Copy full SHA for d403b18 - Browse repository at this point
Copy the full SHA d403b18View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4087596 - Browse repository at this point
Copy the full SHA 4087596View commit details
Commits on Apr 25, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 0640427 - Browse repository at this point
Copy the full SHA 0640427View commit details
Commits on Apr 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9c57861 - Browse repository at this point
Copy the full SHA 9c57861View commit details
Commits on Apr 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d44e0fb - Browse repository at this point
Copy the full SHA d44e0fbView commit details -
Configuration menu - View commit details
-
Copy full SHA for eb9f15f - Browse repository at this point
Copy the full SHA eb9f15fView commit details -
Revert "With mechanism to fall back if graph capture fails"
This reverts commit eb9f15f.
Configuration menu - View commit details
-
Copy full SHA for 909e4c6 - Browse repository at this point
Copy the full SHA 909e4c6View commit details
Commits on May 1, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5819950 - Browse repository at this point
Copy the full SHA 5819950View commit details
Commits on May 2, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 44af096 - Browse repository at this point
Copy the full SHA 44af096View commit details
Commits on May 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 4e1f2a0 - Browse repository at this point
Copy the full SHA 4e1f2a0View commit details -
- renamed GGML_ALLOW_CUDA_GRAPHS to GGML_CUDA_USE_GRAPHS
- rename env variable to disable CUDA graphs to GGML_CUDA_DISABLE_GRAPHS - updated Makefile build to enable CUDA graphs - removed graph capture failure checking in ggml_cuda_error using a global variable to track this is not thread safe, but I am also not safistied with checking an error by string if this is necessary to workaround some issues with graph capture with eg. cuBLAS, we can pass the ggml_backend_cuda_context to the error checking macro and store the result in the context - fixed several resource leaks - fixed issue with zero node graphs - changed fixed size arrays to vectors - removed the count of number of evaluations before start capturing, and instead changed the capture mode to relaxed - removed the check for multiple devices so that it is still possible to use a single device, instead checks for split buffers to disable cuda graphs with -sm row - changed the op for checking batch size to GGML_OP_ADD, should be more reliable than GGML_OP_SOFT_MAX - code style fixes - things to look into - VRAM usage of the cudaGraphExec_t, if it is significant we may need to make it optional - possibility of using cudaStreamBeginCaptureToGraph to keep track of which ggml graph nodes correspond to which cuda graph nodes
Configuration menu - View commit details
-
Copy full SHA for e830949 - Browse repository at this point
Copy the full SHA e830949View commit details
Commits on May 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for a4c9b90 - Browse repository at this point
Copy the full SHA a4c9b90View commit details -
Configuration menu - View commit details
-
Copy full SHA for ab40e66 - Browse repository at this point
Copy the full SHA ab40e66View commit details -
Configuration menu - View commit details
-
Copy full SHA for f42312e - Browse repository at this point
Copy the full SHA f42312eView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.