-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build with AMD AOCC + AOCL (CPU only) #5005
Comments
I haven't tried AOCC but AOCL works fine and produces a small speedup versus no BLAS library, though I haven't compared it to others. You can compile like this, more or less (based on my bash history from November): # Assuming you downloaded AOCL from https://www.amd.com/en/developer/aocl.html and put it in ~
cd
tar xf aocl-linux-gcc-4.1.0.tar.gz
cd aocl-linux-gcc-4.1.0
./install.sh
cd
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
source ~/aocl/4.1.0/gcc/amd-libs.cfg
cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=AOCL -DBLAS_INCLUDE_DIRS=~/aocl/4.1.0/gcc/include
make -j This is assuming Linux. |
Thank you Nigel!!! Now i try to complete the commands with the AOCC clang++ and Zen3 specific optimization, if everything works i share the results here. |
Exactly. If I do in the dockerfile ENV cmake_cxx_flags="-march=znver2", then it turns out in the logs that somewhere, make puts afterwards -march=native, thus cancelling my -march=znver2 directive. The end result is a sigill on the linux of the cloud. How do you crosscompile? How do you adapt the dockerfile so that the gcc compiler compiles to the target AMD Epyc processor, and not my Intel i7 cpu? |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hi, on my Debian 11 (AMD EPYC 75F3 32-Core Processor, 64 GB ram) i've just installed AMD AOCC + AOCL:
https://www.amd.com/en/developer/aocc.html
https://www.amd.com/en/developer/aocl.html
How can i build llama.cpp with that ? And what arguments for the best optimization (BLAS, LAPACK ...)
Any suggestions are super appreciated!
The text was updated successfully, but these errors were encountered: