New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Introduction of CUDA Graphs to LLama.cpp #6766

Merged

slaren merged 20 commits into ggerganov:master from agray3:ag_cuda_graphs

May 8, 2024

Commits on Apr 19, 2024

DRAFT: Introduction of CUDA Graphs to LLama.cpp

agray3 committed Apr 19, 2024
Configuration menu
View commit details

Copy full SHA for cec409a

Browse repository at this point
Copy the full SHA

cec409a View commit details

Browse the repository at this point in the history

Commits on Apr 22, 2024

FIx issues raised in comments

agray3 committed Apr 22, 2024
Configuration menu
View commit details

Copy full SHA for c8dd0e7

Browse repository at this point
Copy the full SHA

c8dd0e7 View commit details

Browse the repository at this point in the history
Tidied to now only use CUDA runtime (not mixed with driver calls)

agray3 committed Apr 22, 2024
Configuration menu
View commit details

Copy full SHA for 800f4fe

Browse repository at this point
Copy the full SHA

800f4fe View commit details

Browse the repository at this point in the history
disable for multi-gpu and batch size > 1

agray3 committed Apr 22, 2024
Configuration menu
View commit details

Copy full SHA for c2691d9

Browse repository at this point
Copy the full SHA

c2691d9 View commit details

Browse the repository at this point in the history

Commits on Apr 23, 2024

Disable CUDA graphs for old GPU arch and with env var

agray3 committed Apr 23, 2024
Configuration menu
View commit details

Copy full SHA for df4719e

Browse repository at this point
Copy the full SHA

df4719e View commit details

Browse the repository at this point in the history

Commits on Apr 24, 2024

added missing CUDA_CHECKs

agray3 committed Apr 24, 2024
Configuration menu
View commit details

Copy full SHA for c3d4ead

Browse repository at this point
Copy the full SHA

c3d4ead View commit details

Browse the repository at this point in the history
Addressed comments

agray3 committed Apr 24, 2024
Configuration menu
View commit details

Copy full SHA for d403b18

Browse repository at this point
Copy the full SHA

d403b18 View commit details

Browse the repository at this point in the history
further addressed comments

agray3 committed Apr 24, 2024
Configuration menu
View commit details

Copy full SHA for 4087596

Browse repository at this point
Copy the full SHA

4087596 View commit details

Browse the repository at this point in the history

Commits on Apr 25, 2024

limit to GGML_ALLOW_CUDA_GRAPHS defined in llama.cpp cmake

agray3 committed Apr 25, 2024
Configuration menu
View commit details

Copy full SHA for 0640427

Browse repository at this point
Copy the full SHA

0640427 View commit details

Browse the repository at this point in the history

Commits on Apr 29, 2024

Merge branch 'ggerganov:master' into ag_cuda_graphs

agray3 committed Apr 29, 2024
Configuration menu
View commit details

Copy full SHA for 9c57861

Browse repository at this point
Copy the full SHA

9c57861 View commit details

Browse the repository at this point in the history

Commits on Apr 30, 2024

Added more comprehensive graph node checking

agray3 committed Apr 30, 2024
Configuration menu
View commit details

Copy full SHA for d44e0fb

Browse repository at this point
Copy the full SHA

d44e0fb View commit details

Browse the repository at this point in the history
With mechanism to fall back if graph capture fails

agray3 committed Apr 30, 2024
Configuration menu
View commit details

Copy full SHA for eb9f15f

Browse repository at this point
Copy the full SHA

eb9f15f View commit details

Browse the repository at this point in the history
Revert "With mechanism to fall back if graph capture fails"
```
This reverts commit eb9f15f.
```
agray3 committed Apr 30, 2024
Configuration menu
View commit details

Copy full SHA for 909e4c6

Browse repository at this point
Copy the full SHA

909e4c6 View commit details

Browse the repository at this point in the history

Commits on May 1, 2024

Fall back if graph capture fails and address other comments

agray3 committed May 1, 2024
Configuration menu
View commit details

Copy full SHA for 5819950

Browse repository at this point
Copy the full SHA

5819950 View commit details

Browse the repository at this point in the history

Commits on May 2, 2024

Merge branch 'ggerganov:master' into ag_cuda_graphs

agray3 committed May 2, 2024
Configuration menu
View commit details

Copy full SHA for 44af096

Browse repository at this point
Copy the full SHA

44af096 View commit details

Browse the repository at this point in the history

Commits on May 7, 2024

Merge remote-tracking branch 'origin/master' into ag_cuda_graphs

slaren committed May 7, 2024
Configuration menu
View commit details

Copy full SHA for 4e1f2a0

Browse repository at this point
Copy the full SHA

4e1f2a0 View commit details

Browse the repository at this point in the history

- renamed GGML_ALLOW_CUDA_GRAPHS to GGML_CUDA_USE_GRAPHS

- rename env variable to disable CUDA graphs to GGML_CUDA_DISABLE_GRAPHS

- updated Makefile build to enable CUDA graphs

- removed graph capture failure checking in ggml_cuda_error
  using a global variable to track this is not thread safe, but I am also not safistied with checking an error by string
  if this is necessary to workaround some issues with graph capture with eg. cuBLAS, we can pass the ggml_backend_cuda_context to the error checking macro and store the result in the context

- fixed several resource leaks

- fixed issue with zero node graphs

- changed fixed size arrays to vectors

- removed the count of number of evaluations before start capturing, and instead changed the capture mode to relaxed

- removed the check for multiple devices so that it is still possible to use a single device, instead checks for split buffers to disable cuda graphs with -sm row

- changed the op for checking batch size to GGML_OP_ADD, should be more reliable than GGML_OP_SOFT_MAX

- code style fixes

- things to look into
  - VRAM usage of the cudaGraphExec_t, if it is significant we may need to make it optional
  - possibility of using cudaStreamBeginCaptureToGraph to keep track of which ggml graph nodes correspond to which cuda graph nodes

slaren committed May 7, 2024

e830949

Commits on May 8, 2024

fix build without cuda graphs

slaren committed May 8, 2024
Configuration menu
View commit details

Copy full SHA for a4c9b90

Browse repository at this point
Copy the full SHA

a4c9b90 View commit details

Browse the repository at this point in the history
remove outdated comment

slaren committed May 8, 2024
Configuration menu
View commit details

Copy full SHA for ab40e66

Browse repository at this point
Copy the full SHA

ab40e66 View commit details

Browse the repository at this point in the history
replace minimum cc value with a constant

slaren committed May 8, 2024
Configuration menu
View commit details

Copy full SHA for f42312e

Browse repository at this point
Copy the full SHA

f42312e View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction of CUDA Graphs to LLama.cpp #6766

Introduction of CUDA Graphs to LLama.cpp #6766

Commits on Apr 19, 2024

Commits on Apr 22, 2024

Commits on Apr 23, 2024

Commits on Apr 24, 2024

Commits on Apr 25, 2024

Commits on Apr 29, 2024

Commits on Apr 30, 2024

Commits on May 1, 2024

Commits on May 2, 2024

Commits on May 7, 2024

Commits on May 8, 2024