Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with integrating GPT-J and llama.cpp in one binary and supporting both with current repo / forks of ggml #86

Open
manyoso opened this issue Apr 14, 2023 · 4 comments

Comments

@manyoso
Copy link
Contributor

manyoso commented Apr 14, 2023

Hello.

We've recently released gpt4all-j which comes with a new Qt based GUI and three installers for all major OS platforms. You can see here: https://gpt4all.io/index.html

This new GUI takes advantage of this repo as a submodule and builds installers for all three OS. However, because of the way this library is built it is going to be very hard to support all three major OS's across the different hardware lines we'd like to support with one universal installer.

Furthermore, the fork of ggml into the llama.cpp repo makes it very hard to support both GPT-J and llama.cpp based models in the same GUI. I would like to fix this without having to fork ggml and/or llama.cpp.

You can find the new GUI and code here: https://github.com/nomic-ai/gpt4all-chat

So here are the problems in a nutshell:

  1. ggml/src/CMakeLists.txt the cmake file that detects system architecture and what intrinsics are supported or all windows, linux, mac... the question is how to make a single installer for each major operating system that will support all these different arch/intrinsics. The only solution I can come up with would be to make a standalone shared lib and compile a different version for each os/architecture and the installer has to be smart about which one it installs by detecting what arch the installed to system has. This will require big changes to the CMakeLists.txt

  2. resolving the fork of ggml and llama.cpp so that I can use one repo as a submodule and support both gpt-j and llama based models in one GUI interface

@ggerganov what do you think about these problems and the best way forward for solving them? I really love your work and so does all of nomic.ai and we really don't want to fork anything. Please let us know your thoughts!

@ggerganov
Copy link
Owner

Hi @manyoso and congrats on the new release!

  1. You need runtime detection of CPU capabilities and dynamically choosing which SIMD intrinsics to use. This requires significant changes to ggml.c. However, they are of very little priority for me, since shipping pre-compiled binaries are of little interest to me. The alternative is as you suggested - have multiple libs and choose the respective one during install.

  2. Please clarify. llama.cpp is hardly a fork because I regularly synchronize the ggml code between this repo and llama.cpp. For example, at the moment, they both have the latest version of ggml.h and ggml.c

Hope this helps

@manyoso
Copy link
Contributor Author

manyoso commented Apr 15, 2023

Yes, I see the synchronization, but some of the code I need for gpt-j is in your ggml repo and not your llama.cpp repo. But I want both gpt-j and llama support. So i can fork the ggml/examples/utils.* and use llama.cpp repo or continue using ggml repo and fork the llama.cpp repo to add llama support.

As for runtime detection, I could add this or I could adding multiple lib support. Both would require modifications to cmake build and one would require modifications to source. I understand this is not a priority for you, but I am willing to do the work if you'll accept the changes in your repo. I really don't want to fork anything of yours because you're doing great work and we really appreciate it and want to take advantage of any advances you make.

Please advise on best course of action.

@ggerganov
Copy link
Owner

Yes, I see the synchronization, but some of the code I need for gpt-j is in your ggml repo and not your llama.cpp repo. But I want both gpt-j and llama support. So i can fork the ggml/examples/utils.* and use llama.cpp repo or continue using ggml repo and fork the llama.cpp repo to add llama support.

We have to add a llama example to the ggml repo that demonstrates the bare-minimum usage of the LLaMA model.
Same thing as the whisper example. The llama example will be synchronized regularly with llama.cpp just by copying the llama.h and llama.cpp files, just as I am currently doing with whisper. This way, the ggml repo will always be compatible with GPT-2, GPT-J, LLaMA and Whisper and provide all necessary utils and build instructions for these models.

In general, I would recommend all forks to add a bare-minimum example to the ggml repo - this way I can help with keeping them synchronized with latest ggml and have better support in the future. (cc @saharNooby (rwkv.cpp), @NouamaneTazi (bloomz.cpp))

As for runtime detection, I could add this or I could adding multiple lib support. Both would require modifications to cmake build and one would require modifications to source. I understand this is not a priority for you, but I am willing to do the work if you'll accept the changes in your repo. I really don't want to fork anything of yours because you're doing great work and we really appreciate it and want to take advantage of any advances you make.

Thinking about this further, I think building multiple libs is the way to go. It's actually not that difficult:

# no SIMD
gcc -c ggml.c -o ggml.o

# AVX
gcc -c -mf16c -mavx ggml.c -o ggml-avx.o

# AVX2
gcc -c -mf16c -mavx -mavx2 ggml.c -o ggml-avx2.o

# AVX2 + OpenBLAS
gcc -c -mf16c -mavx -mavx2 ggml.c -DGGML_USE_OPENBLAS -o ggml-avx2-blas.o

# Apple Silicon
clang -c -DGGML_USE_ACCELERATE  -o ggml-arm.o

# etc ..

And now you just link each object file to the respective dynamic library.
Upon install, all you need is to implement a simple tool / script to determine the CPU architecture and available flags and based on that tell you which dynamic library to install.
Unless I am missing something, this is not difficult to implement and deploy

@manyoso
Copy link
Contributor Author

manyoso commented Apr 17, 2023

Great! This looks like exactly the solution I was hoping for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants