Will M2 ultra be supported to use LLM of 100B paras? #381

SaraiQX · 2023-07-13T02:37:04Z

As titled. As apple-lover without strong CS background, I wish master Georgi could work on more projects for apple users to use M2 ultra (maybe 192G) in order to tap into the potential of real LLMs (over 100B parameters). Is it possible in near future? Thanks to any masterminds!
Best,
Sarai

ggerganov · 2023-07-14T08:19:16Z

My 192GB M2 Ultra should arrive in 1-2 weeks.
Let me know what 100B models are interesting

SaraiQX · 2023-07-15T06:30:19Z

@ggerganov Thrilled to have you kind response! Great thanks in advance 😄~ My first choice for 100+B model is Tigerbot 180B, which is based on bloom architecture, multilingual, and free for commercial use. I've got some ideas regarding family-centered LLM application and really look forward to ggml's update~~
https://github.com/TigerResearch/TigerBot
TigerBot-180B (research version)

Best,
Sarai

LoganDark · 2023-07-20T18:06:39Z

My 192GB M2 Ultra should arrive in 1-2 weeks.

that's insane, where do you get such cash?

JohnnyOpcode · 2023-08-03T08:48:29Z

that's insane, where do you get such cash?

GGML has been funded so it makes sense that a wider array (tensor) of hardware will become available. I've been pondering how far the M2 Ultra can be pushed. AMD has some interesting GPU(s) coming out with large amounts of HBM3 memory too.

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

Interesting times ahead..

LoganDark · 2023-08-03T12:20:25Z

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

This is amazing, Despite it having been already released, it'd hard to find anything about it, including even MSRP or where to buy one. However, assuming you want to be in the merket for one:

It reaches the 3090's boost clock for breakfast
It has 14080 "pipelines" vs the 3060's 10496 CUDA cores
It has approximately 5.8x as many transistors as the 3090
Die process is 5 nm vs 3090's 8 nm
600 watt power draw, as opposed to the 3090's 350W
Physically smaller (267mm double-slot as opposed to 313 triple-slot)
128GB RAM as opposed to 3060's 24GB
8192-bit (!!) memory bus as opposed to 3060's 384-bit
Memory clock speed at 3.2GHz rather than 3060's 19.5GHz (nvidia coping...)
3277GB/s data transfer vs 3060's 936.2GB/s

It seems to be almost on-par with the 4090, but the 4090 makes annoying compromises that aren't present here. For example the 4090 has much less memory capacity, a far shorter bus witch, a far lower requency and a far lower transfer rate as a result.

I don't quite know how this card flew below the all the radars but this looks like en extremely good piece of home-inferencing hardware that may even be capable of training.

Now all that's left is... the cost... good luck finding one. If you do, fat chance it'll be the 128gb version, and fat chance it won't cost you thousands of dollars. Sigh.

SaraiQX · 2023-08-07T08:20:14Z

@LoganDark Thank you so much for your professional advice. I really need more homework to fully understand it. Yes I heard about the AMD new card. Lately I noticed the llama 2-70B can run on a macbook m2 max 12-core CPU and 64 ram, whose price sounds a bit reasonable (in China it's lower than dual 3090 host )...
I guess 128G ram could be more cost-effective if 70B open source model can be mainstream with admirable performance.
Given the need for text-to-image/video need, maybe M2 studio has more advantage besides text-wise inference. 😄

bobqianic · 2023-08-11T05:56:56Z

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

This is amazing, Despite it having been already released, it'd hard to find anything about it, including even MSRP or where to buy one. However, assuming you want to be in the merket for one:

It reaches the 3090's boost clock for breakfast

It has 14080 "pipelines" vs the 3060's 10496 CUDA cores

It has approximately 5.8x as many transistors as the 3090

Die process is 5 nm vs 3090's 8 nm

600 watt power draw, as opposed to the 3090's 350W

Physically smaller (267mm double-slot as opposed to 313 triple-slot)

128GB RAM as opposed to 3060's 24GB

8192-bit (!!) memory bus as opposed to 3060's 384-bit

Memory clock speed at 3.2GHz rather than 3060's 19.5GHz (nvidia coping...)

3277GB/s data transfer vs 3060's 936.2GB/s

It seems to be almost on-par with the 4090, but the 4090 makes annoying compromises that aren't present here. For example the 4090 has much less memory capacity, a far shorter bus witch, a far lower requency and a far lower transfer rate as a result.

I don't quite know how this card flew below the all the radars but this looks like en extremely good piece of home-inferencing hardware that may even be capable of training.

Now all that's left is... the cost... good luck finding one. If you do, fat chance it'll be the 128gb version, and fat chance it won't cost you thousands of dollars. Sigh.

lol. This MI300 card costs more than 20K USD.

LoganDark · 2023-08-11T05:57:42Z

This MI300 card costs more than 20K USD.

Basically called it lol. Guess it's useless then

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will M2 ultra be supported to use LLM of 100B paras? #381

Will M2 ultra be supported to use LLM of 100B paras? #381

SaraiQX commented Jul 13, 2023

ggerganov commented Jul 14, 2023

SaraiQX commented Jul 15, 2023

LoganDark commented Jul 20, 2023

JohnnyOpcode commented Aug 3, 2023

LoganDark commented Aug 3, 2023 •

edited

Loading

SaraiQX commented Aug 7, 2023

bobqianic commented Aug 11, 2023

LoganDark commented Aug 11, 2023

Will M2 ultra be supported to use LLM of 100B paras? #381

Will M2 ultra be supported to use LLM of 100B paras? #381

Comments

SaraiQX commented Jul 13, 2023

ggerganov commented Jul 14, 2023

SaraiQX commented Jul 15, 2023

LoganDark commented Jul 20, 2023

JohnnyOpcode commented Aug 3, 2023

LoganDark commented Aug 3, 2023 • edited Loading

SaraiQX commented Aug 7, 2023

bobqianic commented Aug 11, 2023

LoganDark commented Aug 11, 2023

LoganDark commented Aug 3, 2023 •

edited

Loading