Concerns about ggml_graph_compute's threading #324

CCLDArjun · 2023-06-29T17:18:24Z

Hi, I'm new to ggml and I've been looking at ggml_graph_compute. More specifically, the function it calls, ggml_graph_compute_thread. I think some threads are simultaneously computing the same result.

If you look at ggml_graph_compute_thread, I've drawn out a scenario.

4 threads: a, b, c and d and 4 nodes on the graph.
each thread's n_node begins at -1 and shared->n_active is 4.
a,b,c reduce shared->n_active to 1
d starts computing the first node while a, b and c spin.
d sets shared->node_n = 1 because cgraph->nodes[1]->n_tasks > 1.
since shared->node_n is updated, a,b,c stop spinning and set their own node_n = 1.
doesn't this mean that a,b,c,d compute the same thing simultaneously?

Here's the sample program that I'm using:

#include "ggml.h"
#include <stdio.h>
#include <unistd.h>

int main() {
    struct ggml_init_params params = {
        .mem_size   = 16*1024*1024,
        .mem_buffer = NULL,
    };

    printf("pid: %d\n", getpid());
    struct ggml_context * ctx = ggml_init(params);

    struct ggml_tensor * x = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);

    ggml_set_param(ctx, x);

    struct ggml_tensor * a  = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
    struct ggml_tensor * b  = ggml_new_tensor_1d(ctx, GGML_TYPE_F32, 1);
    struct ggml_tensor * x2 = ggml_mul(ctx, x, x);
    struct ggml_tensor * f  = ggml_add(ctx, ggml_mul(ctx, a, x2), b);
    struct ggml_cgraph gf = ggml_build_forward(f);
    gf.n_threads = 1;

    ggml_set_f32(x, 2.0f);
    ggml_set_f32(a, 3.0f);
    ggml_set_f32(b, 4.0f);

    ggml_graph_compute(ctx, &gf);

    printf("f = %f\n", ggml_get_f32_1d(f, 0));

}

I used gdb to count the number of calls to ggml_compute_forward. When n_threads is set to 1, there are 12 calls and when it's set to 4, there are 21:

1 thread

(gdb) b ggml_compute_forward
Breakpoint 1 at 0x10001ec50: file /Users/ccldarjun/Python/ggml/src/ggml.c, line 15370.
(gdb) ignore 1 100000
Will ignore next 100000 crossings of breakpoint 1.
(gdb) r
Starting program: /Users/ccldarjun/Python/ggml/testing/a.out 
[New Thread 0x1603 of process 22165]
^C[New Thread 0x2003 of process 22165]
warning: unhandled dyld version (17)
pid: 22165
ggml_init: GELU, Quick GELU, SILU and EXP tables initialized in 5.782000 ms
ggml_init: g_state initialized in 0.049000 ms
ggml_init: found unused context 0
ggml_init: context initialized
ggml_build_forward_impl: visited 4 new nodes
ggml_graph_compute_thread: 0/4 pthread id: 1143273024 n_tasks: 1 temp: 1
ggml_graph_compute_thread: 1/4 pthread id: 1143273024 n_tasks: 1 temp: 1
ggml_graph_compute_thread: 2/4 pthread id: 1143273024 n_tasks: 1 temp: 1
ggml_graph_compute_thread: 3/4 pthread id: 1143273024 n_tasks: 1 temp: 1
ggml_graph_compute: perf (1) - cpu = 0.000 / 0.000 ms, wall = 0.000 / 0.000 ms
f = 16.000000
[Inferior 1 (process 22165) exited normally]
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000010001ec50 in ggml_compute_forward at /Users/ccldarjun/Python/ggml/src/ggml.c:15370
	breakpoint already hit 12 times
	ignore next 99988 hits

4 threads

(gdb) b ggml_compute_forward
Breakpoint 1 at 0x10001ec50: file /Users/ccldarjun/Python/ggml/src/ggml.c, line 15370.
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000010001ec50 in ggml_compute_forward at /Users/ccldarjun/Python/ggml/src/ggml.c:15370
(gdb) ignore 
Argument required (a breakpoint number).
(gdb) ignore 1 100000
Will ignore next 100000 crossings of breakpoint 1.
(gdb) r
Starting program: /Users/ccldarjun/Python/ggml/testing/a.out 
[New Thread 0x1203 of process 22148]
[New Thread 0x2003 of process 22148]
warning: unhandled dyld version (17)
pid: 22148
ggml_init: GELU, Quick GELU, SILU and EXP tables initialized in 3.603000 ms
ggml_init: g_state initialized in 0.031000 ms
ggml_init: found unused context 0
ggml_init: context initialized
ggml_build_forward_impl: visited 4 new nodes
[New Thread 0x1307 of process 22148]
[New Thread 0x2103 of process 22148]
[New Thread 0x2903 of process 22148]
ggml_graph_compute_thread: 0/4 pthread id: 63651840 n_tasks: 1 temp: 1
ggml_graph_compute_thread: 1/4 pthread id: 63651840 n_tasks: 4 temp: 1
ggml_graph_compute_thread: 2/4 pthread id: 1143273024 n_tasks: 4 temp: 1
ggml_graph_compute_thread: 3/4 pthread id: 63651840 n_tasks: 4 temp: 1
ggml_graph_compute: perf (1) - cpu = 0.000 / 0.000 ms, wall = 0.000 / 0.000 ms
f = 16.000000
[Inferior 1 (process 22148) exited normally]
(gdb) info breakpoints
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000010001ec50 in ggml_compute_forward at /Users/ccldarjun/Python/ggml/src/ggml.c:15370
	breakpoint already hit 21 times
	ignore next 99979 hits

I think these two things are connected.

The text was updated successfully, but these errors were encountered:

slaren · 2023-06-29T17:50:44Z

There is one call to ggml_compute_forward for GGML_TASK_INIT, another for GGML_TASK_FINALIZE, and for parallelizable tasks, as many GGML_TASK_COMPUTE as there are threads. You have 3 ops here, all parallizable, so with n_threads=1 you should see 9 calls to ggml_compute_forward (3 init, 3 finalize, 3 compute). With n_threads=4, you should see 18 calls (3 init, 3 finalize, 4*3 compute).

That's what I see in my tests. Are you seeing something different?

CCLDArjun · 2023-06-30T08:49:52Z

Ohh ok I see, computing one op by itself is parallizable
I was thinking it was supposed to compute multiple nodes at once so I was confused

With n_threads=4, you should see 18 calls (3 init, 3 finalize, 4*3 compute).

That's what I see in my tests. Are you seeing something different?

I'm getting 21 calls for 4 threads and 12 calls for 1 thread (there's a nop at the start, so it takes up 3 calls).

CCLDArjun closed this as completed Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns about ggml_graph_compute's threading #324

Concerns about ggml_graph_compute's threading #324

CCLDArjun commented Jun 29, 2023

slaren commented Jun 29, 2023 •

edited

Loading

CCLDArjun commented Jun 30, 2023

Concerns about ggml_graph_compute's threading #324

Concerns about ggml_graph_compute's threading #324

Comments

CCLDArjun commented Jun 29, 2023

slaren commented Jun 29, 2023 • edited Loading

CCLDArjun commented Jun 30, 2023

slaren commented Jun 29, 2023 •

edited

Loading