Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will M2 ultra be supported to use LLM of 100B paras? #381

Open
SaraiQX opened this issue Jul 13, 2023 · 8 comments
Open

Will M2 ultra be supported to use LLM of 100B paras? #381

SaraiQX opened this issue Jul 13, 2023 · 8 comments

Comments

@SaraiQX
Copy link

SaraiQX commented Jul 13, 2023

As titled. As apple-lover without strong CS background, I wish master Georgi could work on more projects for apple users to use M2 ultra (maybe 192G) in order to tap into the potential of real LLMs (over 100B parameters). Is it possible in near future? Thanks to any masterminds!
Best,
Sarai

@ggerganov
Copy link
Owner

My 192GB M2 Ultra should arrive in 1-2 weeks.
Let me know what 100B models are interesting

@SaraiQX
Copy link
Author

SaraiQX commented Jul 15, 2023

@ggerganov Thrilled to have you kind response! Great thanks in advance 😄~ My first choice for 100+B model is Tigerbot 180B, which is based on bloom architecture, multilingual, and free for commercial use. I've got some ideas regarding family-centered LLM application and really look forward to ggml's update~~
https://github.com/TigerResearch/TigerBot
TigerBot-180B (research version)

Best,
Sarai

@LoganDark
Copy link
Contributor

My 192GB M2 Ultra should arrive in 1-2 weeks.

that's insane, where do you get such cash?

@JohnnyOpcode
Copy link

that's insane, where do you get such cash?

GGML has been funded so it makes sense that a wider array (tensor) of hardware will become available. I've been pondering how far the M2 Ultra can be pushed. AMD has some interesting GPU(s) coming out with large amounts of HBM3 memory too.

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

Interesting times ahead..

@LoganDark
Copy link
Contributor

LoganDark commented Aug 3, 2023

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

This is amazing, Despite it having been already released, it'd hard to find anything about it, including even MSRP or where to buy one. However, assuming you want to be in the merket for one:

  • It reaches the 3090's boost clock for breakfast
  • It has 14080 "pipelines" vs the 3060's 10496 CUDA cores
  • It has approximately 5.8x as many transistors as the 3090
  • Die process is 5 nm vs 3090's 8 nm
  • 600 watt power draw, as opposed to the 3090's 350W
  • Physically smaller (267mm double-slot as opposed to 313 triple-slot)
  • 128GB RAM as opposed to 3060's 24GB
  • 8192-bit (!!) memory bus as opposed to 3060's 384-bit
  • Memory clock speed at 3.2GHz rather than 3060's 19.5GHz (nvidia coping...)
  • 3277GB/s data transfer vs 3060's 936.2GB/s

It seems to be almost on-par with the 4090, but the 4090 makes annoying compromises that aren't present here. For example the 4090 has much less memory capacity, a far shorter bus witch, a far lower requency and a far lower transfer rate as a result.

I don't quite know how this card flew below the all the radars but this looks like en extremely good piece of home-inferencing hardware that may even be capable of training.

Now all that's left is... the cost... good luck finding one. If you do, fat chance it'll be the 128gb version, and fat chance it won't cost you thousands of dollars. Sigh.

@SaraiQX
Copy link
Author

SaraiQX commented Aug 7, 2023

@LoganDark Thank you so much for your professional advice. I really need more homework to fully understand it. Yes I heard about the AMD new card. Lately I noticed the llama 2-70B can run on a macbook m2 max 12-core CPU and 64 ram, whose price sounds a bit reasonable (in China it's lower than dual 3090 host )...
I guess 128G ram could be more cost-effective if 70B open source model can be mainstream with admirable performance.
Given the need for text-to-image/video need, maybe M2 studio has more advantage besides text-wise inference. 😄

@bobqianic
Copy link
Contributor

https://www.techpowerup.com/gpu-specs/radeon-instinct-mi300.c4019

This is amazing, Despite it having been already released, it'd hard to find anything about it, including even MSRP or where to buy one. However, assuming you want to be in the merket for one:

  • It reaches the 3090's boost clock for breakfast
  • It has 14080 "pipelines" vs the 3060's 10496 CUDA cores
  • It has approximately 5.8x as many transistors as the 3090
  • Die process is 5 nm vs 3090's 8 nm
  • 600 watt power draw, as opposed to the 3090's 350W
  • Physically smaller (267mm double-slot as opposed to 313 triple-slot)
  • 128GB RAM as opposed to 3060's 24GB
  • 8192-bit (!!) memory bus as opposed to 3060's 384-bit
  • Memory clock speed at 3.2GHz rather than 3060's 19.5GHz (nvidia coping...)
  • 3277GB/s data transfer vs 3060's 936.2GB/s

It seems to be almost on-par with the 4090, but the 4090 makes annoying compromises that aren't present here. For example the 4090 has much less memory capacity, a far shorter bus witch, a far lower requency and a far lower transfer rate as a result.

I don't quite know how this card flew below the all the radars but this looks like en extremely good piece of home-inferencing hardware that may even be capable of training.

Now all that's left is... the cost... good luck finding one. If you do, fat chance it'll be the 128gb version, and fat chance it won't cost you thousands of dollars. Sigh.

lol. This MI300 card costs more than 20K USD.

@LoganDark
Copy link
Contributor

This MI300 card costs more than 20K USD.

Basically called it lol. Guess it's useless then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants