apple silicon GPU thread count correct? #257

jac-cbi · 2023-06-14T13:56:18Z

I've read with extreme interest the efforts to use the Apple Silicon GPUs with ggml. However, I think there might be some mis-understanding. I hope it's me, but if it's not, there's a HUGE performance gain possible.

Several of the ggml benchmarks floating around seem to indicate 1 thread per GPU core. which is nice and all, but we can do orders of magnitude better:

Each GPU core is split into 16 execution units, which each contain eight arithmetic logic units (ALUs).

Wikipedia: Apple M2 GPU

Which, if I understand correctly, means we should be able to launch 128 threads per GPU core. And with Apple Silicons' Unified memory, they'll all have direct access to all of system RAM.

Am I missing something here? I'm really conflicted... I want to be right, but I highly doubt this was missed.

Cc: @ggerganov

jac-cbi · 2023-06-14T14:34:42Z

This is the slide for the M1 Ultra, it seems marketing refers to the ALU count as the "execution units". I don't yet grok how that gets to "196,608 concurrent threads"... Well, other than multiplying by 24 :-P

SaraiQX · 2023-07-13T02:35:55Z

wow. As apple-lover without strong CS background, I hope master Georgi could work on more projects for apple users to use M2 ultra (maybe 192G) in order to tap into the potential of real LLMs (over 100B parameters). Is it possible in near future? Thanks to any masterminds!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apple silicon GPU thread count correct? #257

apple silicon GPU thread count correct? #257

jac-cbi commented Jun 14, 2023

jac-cbi commented Jun 14, 2023

SaraiQX commented Jul 13, 2023

apple silicon GPU thread count correct? #257

apple silicon GPU thread count correct? #257

Comments

jac-cbi commented Jun 14, 2023

jac-cbi commented Jun 14, 2023

SaraiQX commented Jul 13, 2023