Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

manyoso · 2023-04-14T13:23:28Z

Hello.

We've recently released gpt4all-j which comes with a new Qt based GUI and three installers for all major OS platforms. You can see here: https://gpt4all.io/index.html

This new GUI takes advantage of this repo as a submodule and builds installers for all three OS. However, because of the way this library is built it is going to be very hard to support all three major OS's across the different hardware lines we'd like to support with one universal installer.

Furthermore, the fork of ggml into the llama.cpp repo makes it very hard to support both GPT-J and llama.cpp based models in the same GUI. I would like to fix this without having to fork ggml and/or llama.cpp.

You can find the new GUI and code here: https://github.com/nomic-ai/gpt4all-chat

So here are the problems in a nutshell:

ggml/src/CMakeLists.txt the cmake file that detects system architecture and what intrinsics are supported or all windows, linux, mac... the question is how to make a single installer for each major operating system that will support all these different arch/intrinsics. The only solution I can come up with would be to make a standalone shared lib and compile a different version for each os/architecture and the installer has to be smart about which one it installs by detecting what arch the installed to system has. This will require big changes to the CMakeLists.txt
resolving the fork of ggml and llama.cpp so that I can use one repo as a submodule and support both gpt-j and llama based models in one GUI interface

@ggerganov what do you think about these problems and the best way forward for solving them? I really love your work and so does all of nomic.ai and we really don't want to fork anything. Please let us know your thoughts!

ggerganov · 2023-04-15T11:42:22Z

Hi @manyoso and congrats on the new release!

You need runtime detection of CPU capabilities and dynamically choosing which SIMD intrinsics to use. This requires significant changes to ggml.c. However, they are of very little priority for me, since shipping pre-compiled binaries are of little interest to me. The alternative is as you suggested - have multiple libs and choose the respective one during install.
Please clarify. llama.cpp is hardly a fork because I regularly synchronize the ggml code between this repo and llama.cpp. For example, at the moment, they both have the latest version of ggml.h and ggml.c

Hope this helps

manyoso · 2023-04-15T11:56:34Z

Yes, I see the synchronization, but some of the code I need for gpt-j is in your ggml repo and not your llama.cpp repo. But I want both gpt-j and llama support. So i can fork the ggml/examples/utils.* and use llama.cpp repo or continue using ggml repo and fork the llama.cpp repo to add llama support.

As for runtime detection, I could add this or I could adding multiple lib support. Both would require modifications to cmake build and one would require modifications to source. I understand this is not a priority for you, but I am willing to do the work if you'll accept the changes in your repo. I really don't want to fork anything of yours because you're doing great work and we really appreciate it and want to take advantage of any advances you make.

Please advise on best course of action.

ggerganov · 2023-04-17T09:01:17Z

Yes, I see the synchronization, but some of the code I need for gpt-j is in your ggml repo and not your llama.cpp repo. But I want both gpt-j and llama support. So i can fork the ggml/examples/utils.* and use llama.cpp repo or continue using ggml repo and fork the llama.cpp repo to add llama support.

We have to add a llama example to the ggml repo that demonstrates the bare-minimum usage of the LLaMA model.
Same thing as the whisper example. The llama example will be synchronized regularly with llama.cpp just by copying the llama.h and llama.cpp files, just as I am currently doing with whisper. This way, the ggml repo will always be compatible with GPT-2, GPT-J, LLaMA and Whisper and provide all necessary utils and build instructions for these models.

In general, I would recommend all forks to add a bare-minimum example to the ggml repo - this way I can help with keeping them synchronized with latest ggml and have better support in the future. (cc @saharNooby (rwkv.cpp), @NouamaneTazi (bloomz.cpp))

As for runtime detection, I could add this or I could adding multiple lib support. Both would require modifications to cmake build and one would require modifications to source. I understand this is not a priority for you, but I am willing to do the work if you'll accept the changes in your repo. I really don't want to fork anything of yours because you're doing great work and we really appreciate it and want to take advantage of any advances you make.

Thinking about this further, I think building multiple libs is the way to go. It's actually not that difficult:

# no SIMD
gcc -c ggml.c -o ggml.o

# AVX
gcc -c -mf16c -mavx ggml.c -o ggml-avx.o

# AVX2
gcc -c -mf16c -mavx -mavx2 ggml.c -o ggml-avx2.o

# AVX2 + OpenBLAS
gcc -c -mf16c -mavx -mavx2 ggml.c -DGGML_USE_OPENBLAS -o ggml-avx2-blas.o

# Apple Silicon
clang -c -DGGML_USE_ACCELERATE  -o ggml-arm.o

# etc ..

And now you just link each object file to the respective dynamic library.
Upon install, all you need is to implement a simple tool / script to determine the CPU architecture and available flags and based on that tell you which dynamic library to install.
Unless I am missing something, this is not difficult to implement and deploy

manyoso · 2023-04-17T20:20:43Z

Great! This looks like exactly the solution I was hoping for.

manyoso mentioned this issue Apr 14, 2023

Windows - GPT4ALL not open nomic-ai/gpt4all-chat#19

Closed

ggerganov mentioned this issue Jul 1, 2023

Avoid unused constant warnings ggerganov/llama.cpp#2029

Open

ggerganov mentioned this issue Mar 9, 2024

Compilation / distribution of a whisper.dll that 'just works' and is performant across all x86 processors, whether they support AVX2 extensions or not ggerganov/whisper.cpp#1939

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

manyoso commented Apr 14, 2023 •

edited

Loading

ggerganov commented Apr 15, 2023

manyoso commented Apr 15, 2023

ggerganov commented Apr 17, 2023

manyoso commented Apr 17, 2023

Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

Comments

manyoso commented Apr 14, 2023 • edited Loading

ggerganov commented Apr 15, 2023

manyoso commented Apr 15, 2023

ggerganov commented Apr 17, 2023

manyoso commented Apr 17, 2023

manyoso commented Apr 14, 2023 •

edited

Loading