metal : wrap each operation in debug group #690

jmousseau · 2024-01-10T02:29:47Z

The screenshot below shows a Metal debug capture of gpt-2-backend with the addition of debug groups.

ggerganov

Hm this is very interesting and potentially useful! If you know more ways to improve the metal code for debugging and profiling purposes (e.g. using instruments) - please share.

P.S. This does not have effect on performance in Release builds - correct?

jmousseau · 2024-01-10T14:44:44Z

If you know more ways to improve the metal code for debugging and profiling purposes (e.g. using instruments) - please share.

There are some next steps I'd like to explore:

Add a ggml_backend_metal_capture_next_compute that would insert the necessary capture boundaries.
Consolidating constant kernel arguments into a single struct (metal : simplify kernel arguments using a struct llama.cpp#3229)

P.S. This does not have effect on performance in Release builds - correct?

As far as I know, performance should be unaffected in release builds. In my testing locally, I didn't see any changing in the timings. Would wrapping the capture and debug logic in GGML_METAL_NDEBUG be preferable?

ggerganov · 2024-01-11T16:06:56Z

In llama.cpp we've decided to guard the debug calls with GGML_METAL_NDEBUG:

ggerganov/llama.cpp@2a7c94d

Will sync the changes here soon

ggerganov · 2024-01-16T14:56:29Z

@jmousseau How do you create these Metal debug captures that you've shown in the screenshot? I'm not very familiar with Xcode - been trying to figure it, but no luck so far. Would appreciate if you can share some step-by-step instructions

jmousseau · 2024-01-18T02:32:09Z

@ggerganov Here are the steps I use, starting with Xcode project generation.

cmake -DGGML_METAL=ON -DBUILD_SHARED_LIBS=Off -G Xcode ..

open ggml.xcodeproj

Select the gpt-2-backend scheme at the top, right of the git info.

Again click on the gpt-2-backend scheme, and choose Edit Scheme.... Configure the desired launch arguments and environment variables.

Traditionally, the easiest way to produce a Metal capture is with the "M" button above the debug console as shown below.

However, this will only capture GPU work enqueued after the button is pressed. For traditional graphics programs, this isn't a problem as the next rendered frame will be captured. In our case, the work will be queued (command buffers and encoders created) before you're able to initiate the capture.

Therefore, setting up the capture boundary programmatically necessary. In main-backend.cpp, you'll want to initiate a capture before ggml_backend_graph_compute is called (requires #694).

if (ggml_backend_is_metal(model.backend)) {
    ggml_backend_metal_capture_next_compute(model.backend);
}

Run the program by pressing the play button. Once the GPU work completes, Xcode will automatically open the Metal debugger.

metal : wrap each operation in debug group

52d957e

ggerganov approved these changes Jan 10, 2024

View reviewed changes

ggerganov merged commit 2f3b12f into ggerganov:master Jan 10, 2024
4 checks passed

jmousseau deleted the metal-debug-groups branch January 10, 2024 14:49

ggerganov added a commit that referenced this pull request Jan 11, 2024

metal : fix deprecation warning (#690)

979cc23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : wrap each operation in debug group #690

metal : wrap each operation in debug group #690

jmousseau commented Jan 10, 2024

ggerganov left a comment

jmousseau commented Jan 10, 2024

ggerganov commented Jan 11, 2024

ggerganov commented Jan 16, 2024

jmousseau commented Jan 18, 2024

metal : wrap each operation in debug group #690

metal : wrap each operation in debug group #690

Conversation

jmousseau commented Jan 10, 2024

ggerganov left a comment

Choose a reason for hiding this comment

jmousseau commented Jan 10, 2024

ggerganov commented Jan 11, 2024

ggerganov commented Jan 16, 2024

jmousseau commented Jan 18, 2024