Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix: add cuda to the flake #3202

Merged
merged 1 commit into from
Sep 25, 2023
Merged

Conversation

Green-Sky
Copy link
Collaborator

@Green-Sky Green-Sky commented Sep 15, 2023

this is very much a makeshift solution, but works.
This is like other cmake based nix pkgs.

waiting will open another pr when NixOS/nixpkgs#224291 gets resolved. (hopefully this year)

Build
$ nix build --impure '.#cuda' --print-build-logs --rebuild
llama.cpp> Sourcing setup-cuda-hook
llama.cpp> Executing setupCUDAToolkitCompilers
llama.cpp> unpacking sources
llama.cpp> unpacking source archive /nix/store/yvpd0qq10p6jhdhj569db06fr2nd2w28-c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source
llama.cpp> source root is c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source
llama.cpp> patching sources
llama.cpp> updateAutotoolsGnuConfigScriptsPhase
llama.cpp> configuring
llama.cpp> fixing cmake files...
llama.cpp> cmake flags: -GNinja -DCMAKE_FIND_USE_SYSTEM_PACKAGE_REGISTRY=OFF -DCMAKE_FIND_USE_PACKAGE_REGISTRY=OFF -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=OFF -DCMAKE_INSTALL_LOCALEDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/share/locale -DCMAKE_INSTALL_LIBEXECDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/libexec -DCMAKE_INSTALL_LIBDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib -DCMAKE_INSTALL_DOCDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/share/doc/llama.cpp -DCMAKE_INSTALL_INFODIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/share/info -DCMAKE_INSTALL_MANDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/share/man -DCMAKE_INSTALL_OLDINCLUDEDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include -DCMAKE_INSTALL_INCLUDEDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include -DCMAKE_INSTALL_SBINDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/sbin -DCMAKE_INSTALL_BINDIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin -DCMAKE_INSTALL_NAME_DIR=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib -DCMAKE_POLICY_DEFAULT_CMP0025=NEW -DCMAKE_OSX_SYSROOT= -DCMAKE_FIND_FRAMEWORK=LAST -DCMAKE_STRIP=/nix/store/n847wr4vj9f3nszbgnqz9n8w3vnnfmcd-gcc-wrapper-12.3.0/bin/strip -DCMAKE_RANLIB=/nix/store/n847wr4vj9f3nszbgnqz9n8w3vnnfmcd-gcc-wrapper-12.3.0/bin/ranlib -DCMAKE_AR=/nix/store/n847wr4vj9f3nszbgnqz9n8w3vnnfmcd-gcc-wrapper-12.3.0/bin/ar -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_INSTALL_PREFIX=/nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp -DLLAMA_BUILD_SERVER=ON -DLLAMA_MPI=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_SKIP_BUILD_RPATH=ON -DLLAMA_CUBLAS=ON -DCUDA_TOOLKIT_ROOT_DIR=/nix/store/8a92d7ym9z3ms6rvcfbbgaw2b7zn3zz8-cudatoolkit-11.8.0 -DCUDA_HOST_COMPILER=/nix/store/0p0nzv56lq5gg3fr4l22dav7i91pbfdh-gcc-wrapper-11.4.0/bin/c++ -DCMAKE_CUDA_HOST_COMPILER=/nix/store/0p0nzv56lq5gg3fr4l22dav7i91pbfdh-gcc-wrapper-11.4.0/bin/c++ -DCUDAToolkit_INCLUDE_DIR=/nix/store/480xk3ahss2n2l1dqgrwhfkhdwld8rzw-cudatoolkit-11.8.0-merged/include -DCUDAToolkit_ROOT=/nix/store/480xk3ahss2n2l1dqgrwhfkhdwld8rzw-cudatoolkit-11.8.0-merged;/nix/store/6qz06h6c17bfj3a9qx4d5n7f00h06diz-cudatoolkit-11.8.0-lib
llama.cpp> -- The C compiler identification is GNU 12.3.0
llama.cpp> -- The CXX compiler identification is GNU 12.3.0
llama.cpp> -- Detecting C compiler ABI info
llama.cpp> -- Detecting C compiler ABI info - done
llama.cpp> -- Check for working C compiler: /nix/store/n847wr4vj9f3nszbgnqz9n8w3vnnfmcd-gcc-wrapper-12.3.0/bin/gcc - skipped
llama.cpp> -- Detecting C compile features
llama.cpp> -- Detecting C compile features - done
llama.cpp> -- Detecting CXX compiler ABI info
llama.cpp> -- Detecting CXX compiler ABI info - done
llama.cpp> -- Check for working CXX compiler: /nix/store/n847wr4vj9f3nszbgnqz9n8w3vnnfmcd-gcc-wrapper-12.3.0/bin/g++ - skipped
llama.cpp> -- Detecting CXX compile features
llama.cpp> -- Detecting CXX compile features - done
llama.cpp> -- Could NOT find Git (missing: GIT_EXECUTABLE)
llama.cpp> CMake Warning at scripts/build-info.cmake:20 (message):
llama.cpp>   Git not found using 'find_package' or 'which'.  Build info will not be
llama.cpp>   accurate.  Consider installing Git or ensuring it is in the PATH.
llama.cpp> Call Stack (most recent call first):
llama.cpp>   CMakeLists.txt:102 (include)
llama.cpp>
llama.cpp> CMake Warning at CMakeLists.txt:127 (message):
llama.cpp>   Git repository not found; to enable automatic generation of build info,
llama.cpp>   make sure Git is installed and the project is a Git repository.
llama.cpp>
llama.cpp> -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
llama.cpp> -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
llama.cpp> -- Found Threads: TRUE
llama.cpp> -- Found CUDAToolkit: /nix/store/480xk3ahss2n2l1dqgrwhfkhdwld8rzw-cudatoolkit-11.8.0-merged/include (found version "11.8.89")
llama.cpp> -- cuBLAS found
llama.cpp> -- The CUDA compiler identification is NVIDIA 11.8.89
llama.cpp> -- Detecting CUDA compiler ABI info
llama.cpp> -- Detecting CUDA compiler ABI info - done
llama.cpp> -- Check for working CUDA compiler: /nix/store/480xk3ahss2n2l1dqgrwhfkhdwld8rzw-cudatoolkit-11.8.0-merged/bin/nvcc - skipped
llama.cpp> -- Detecting CUDA compile features
llama.cpp> -- Detecting CUDA compile features - done
llama.cpp> -- Using CUDA architectures: 52;61;70
llama.cpp> -- Found MPI_C: /nix/store/b3n9qh9lxssxril6knkav5lnwrlaq9nx-openmpi-4.1.5/lib/libmpi.so (found version "3.1")
llama.cpp> -- Found MPI_CXX: /nix/store/b3n9qh9lxssxril6knkav5lnwrlaq9nx-openmpi-4.1.5/lib/libmpi_cxx.so (found version "3.1")
llama.cpp> -- Found MPI: TRUE (found version "3.1")
llama.cpp> -- MPI found
llama.cpp> -- CMAKE_SYSTEM_PROCESSOR: x86_64
llama.cpp> -- x86 detected
llama.cpp> -- Configuring done (5.8s)
llama.cpp> -- Generating done (0.0s)
llama.cpp> CMake Warning:
llama.cpp>   Manually-specified variables were not used by the project:
llama.cpp>     CMAKE_EXPORT_NO_PACKAGE_REGISTRY
llama.cpp>     CMAKE_POLICY_DEFAULT_CMP0025
llama.cpp>     CUDA_HOST_COMPILER
llama.cpp>     CUDA_TOOLKIT_ROOT_DIR
llama.cpp>
llama.cpp> -- Build files have been written to: /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/build
llama.cpp> cmake: enabled parallel building
llama.cpp> cmake: enabled parallel installing
llama.cpp> building
llama.cpp> build flags: -j24
llama.cpp> [1/70] Building C object tests/CMakeFiles/test-c.dir/test-c.c.o
llama.cpp> [2/70] Building C object CMakeFiles/ggml.dir/ggml-mpi.c.o
llama.cpp> [3/70] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
llama.cpp> [4/70] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
llama.cpp> [5/70] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
llama.cpp> [6/70] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o
llama.cpp> [7/70] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o
llama.cpp> [8/70] Building CXX object tests/CMakeFiles/test-tokenizer-1-llama.dir/test-tokenizer-1-llama.cpp.o
llama.cpp> [9/70] Building CXX object examples/embedding/CMakeFiles/embedding.dir/embedding.cpp.o
llama.cpp> [10/70] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
llama.cpp> [11/70] Building CXX object examples/benchmark/CMakeFiles/benchmark.dir/benchmark-matmult.cpp.o
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/benchmark/benchmark-matmult.cpp:23:6: warning: no previous declaration for 'void ggml_graph_compute_helper(std::vector<unsigned char>&, ggml_cgraph*, int)' [-Wmissing-declarations]
llama.cpp>    23 | void ggml_graph_compute_helper(std::vector<uint8_t> & buf, ggml_cgraph * graph, int n_threads) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/benchmark/benchmark-matmult.cpp:34:7: warning: no previous declaration for 'float tensor_sum_elements(const ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>    34 | float tensor_sum_elements(const ggml_tensor * tensor) {
llama.cpp>       |       ^~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/benchmark/benchmark-matmult.cpp:46:6: warning: no previous declaration for 'void tensor_dump(const ggml_tensor*, const char*)' [-Wmissing-declarations]
llama.cpp>    46 | void tensor_dump(const ggml_tensor * tensor, const char * name) {
llama.cpp>       |      ^~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/benchmark/benchmark-matmult.cpp:61:6: warning: no previous declaration for 'void print_usage(int, char**, benchmark_params_struct)' [-Wmissing-declarations]
llama.cpp>    61 | void print_usage(int /*argc*/, char ** argv, struct benchmark_params_struct params) {
llama.cpp>       |      ^~~~~~~~~~~
llama.cpp> [12/70] Building CXX object examples/save-load-state/CMakeFiles/save-load-state.dir/save-load-state.cpp.o
llama.cpp> [13/70] Building CXX object tests/CMakeFiles/test-grad0.dir/test-grad0.cpp.o
llama.cpp> [14/70] Building CXX object examples/baby-llama/CMakeFiles/baby-llama.dir/baby-llama.cpp.o
llama.cpp> In function 'ggml_tensor* forward(llama_model*, llama_kv_cache*, ggml_context*, ggml_cgraph*, ggml_tensor*, int, int)',
llama.cpp>     inlined from 'int main(int, char**)' at /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:1683:50:
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:611:78: warning: 'kv_self.llama_kv_cache::v' may be used uninitialized [-Wmaybe-uninitialized]
llama.cpp>   611 |                 vc = ggml_set_2d(ctx0, vc, Vcur, (   n_ctx)*ggml_element_size(kv_self.v),
llama.cpp>       |                                                             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp: In function 'int main(int, char**)':
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:1582:27: note: 'kv_self.llama_kv_cache::v' was declared here
llama.cpp>  1582 |     struct llama_kv_cache kv_self;
llama.cpp>       |                           ^~~~~~~
llama.cpp> In function 'ggml_tensor* forward(llama_model*, llama_kv_cache*, ggml_context*, ggml_cgraph*, ggml_tensor*, int, int)',
llama.cpp>     inlined from 'int main(int, char**)' at /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:1683:50:
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:610:101: warning: 'kv_self.llama_kv_cache::k' may be used uninitialized [-Wmaybe-uninitialized]
llama.cpp>   610 |                 kc = ggml_set_1d(ctx0, kc, ggml_reshape_1d(ctx0, Kcur, n_embd*N), (ggml_element_size(kv_self.k)*n_embd)*(il*n_ctx + n_past));
llama.cpp>       |                                                                                    ~~~~~~~~~~~~~~~~~^~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp: In function 'int main(int, char**)':
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/baby-llama/baby-llama.cpp:1582:27: note: 'kv_self.llama_kv_cache::k' was declared here
llama.cpp>  1582 |     struct llama_kv_cache kv_self;
llama.cpp>       |                           ^~~~~~~
llama.cpp> [15/70] Building CXX object tests/CMakeFiles/test-tokenizer-0-falcon.dir/test-tokenizer-0-falcon.cpp.o
llama.cpp> [16/70] Building CXX object tests/CMakeFiles/test-tokenizer-0-llama.dir/test-tokenizer-0-llama.cpp.o
llama.cpp> [17/70] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
llama.cpp> [18/70] Building CXX object examples/simple/CMakeFiles/simple.dir/simple.cpp.o
llama.cpp> [19/70] Building CXX object examples/embd-input/CMakeFiles/embd-input-test.dir/embd-input-test.cpp.o
llama.cpp> [20/70] Building C object CMakeFiles/ggml.dir/k_quants.c.o
llama.cpp> [21/70] Building CXX object examples/perplexity/CMakeFiles/perplexity.dir/perplexity.cpp.o
llama.cpp> [22/70] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/pocs/vdot/q8dot.cpp:57:6: warning: no previous declaration for 'void fillQ80blocks(std::vector<block_q8_0>&, std::mt19937&)' [-Wmissing-declarations]
llama.cpp>    57 | void fillQ80blocks(std::vector<block_q8_0>& blocks, std::mt19937& rndm) {
llama.cpp>       |      ^~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/pocs/vdot/q8dot.cpp:69:7: warning: no previous declaration for 'float simpleDot(const block_q4_0&, const block_q8_0&)' [-Wmissing-declarations]
llama.cpp>    69 | float simpleDot(const block_q4_0& x, const block_q8_0& y) {
llama.cpp>       |       ^~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/pocs/vdot/q8dot.cpp:84:7: warning: no previous declaration for 'float simpleDot(const block_q4_1&, const block_q8_0&)' [-Wmissing-declarations]
llama.cpp>    84 | float simpleDot(const block_q4_1& x, const block_q8_0& y) {
llama.cpp>       |       ^~~~~~~~~
llama.cpp> [23/70] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o
llama.cpp> [24/70] Building CXX object pocs/vdot/CMakeFiles/vdot.dir/vdot.cpp.o
llama.cpp> [25/70] Building CXX object examples/beam-search/CMakeFiles/beam-search.dir/beam-search.cpp.o
llama.cpp> [26/70] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
llama.cpp> [27/70] Building CXX object examples/embd-input/CMakeFiles/embdinput.dir/embd-input-lib.cpp.o
llama.cpp> [28/70] Building CXX object examples/speculative/CMakeFiles/speculative.dir/speculative.cpp.o
llama.cpp> [29/70] Building CXX object examples/train-text-from-scratch/CMakeFiles/train-text-from-scratch.dir/train-text-from-scratch.cpp.o
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:33:6: warning: no previous declaration for 'void init_random_normal_distribution(random_normal_distribution*, int, float, float, float, float)' [-Wmissing-declarations]
llama.cpp>    33 | void init_random_normal_distribution(struct random_normal_distribution * rnd, int seed, float mean, float std, float min, float max) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:40:6: warning: no previous declaration for 'void init_random_uniform_distribution(random_uniform_distribution*, int, float, float)' [-Wmissing-declarations]
llama.cpp>    40 | void init_random_uniform_distribution(struct random_uniform_distribution * rnd, int seed, float min, float max) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:45:5: warning: no previous declaration for 'int clamp(int, int, int)' [-Wmissing-declarations]
llama.cpp>    45 | int clamp(const int v, const int min, const int max) {
llama.cpp>       |     ^~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:49:7: warning: no previous declaration for 'float fclamp(float, float, float)' [-Wmissing-declarations]
llama.cpp>    49 | float fclamp(const float v, const float min, const float max) {
llama.cpp>       |       ^~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:53:7: warning: no previous declaration for 'float frand()' [-Wmissing-declarations]
llama.cpp>    53 | float frand() {
llama.cpp>       |       ^~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:57:7: warning: no previous declaration for 'float frand_normal(random_normal_distribution*)' [-Wmissing-declarations]
llama.cpp>    57 | float frand_normal(struct random_normal_distribution * rnd) {
llama.cpp>       |       ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:61:7: warning: no previous declaration for 'float frand_uniform(random_uniform_distribution*)' [-Wmissing-declarations]
llama.cpp>    61 | float frand_uniform(struct random_uniform_distribution * rnd) {
llama.cpp>       |       ^~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:65:22: warning: no previous declaration for 'ggml_tensor* randomize_tensor_normal(ggml_tensor*, random_normal_distribution*)' [-Wmissing-declarations]
llama.cpp>    65 | struct ggml_tensor * randomize_tensor_normal(struct ggml_tensor * tensor, struct random_normal_distribution * rnd) {
llama.cpp>       |                      ^~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:114:22: warning: no previous declaration for 'ggml_tensor* randomize_tensor_uniform(ggml_tensor*, random_uniform_distribution*)' [-Wmissing-declarations]
llama.cpp>   114 | struct ggml_tensor * randomize_tensor_uniform(struct ggml_tensor * tensor, struct random_uniform_distribution * rnd) {
llama.cpp>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:289:6: warning: no previous declaration for 'void print_params(my_llama_hparams*)' [-Wmissing-declarations]
llama.cpp>   289 | void print_params(struct my_llama_hparams * params) {
llama.cpp>       |      ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:299:6: warning: no previous declaration for 'void init_model(my_llama_model*)' [-Wmissing-declarations]
llama.cpp>   299 | void init_model(struct my_llama_model * model) {
llama.cpp>       |      ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:366:6: warning: no previous declaration for 'void set_param_model(my_llama_model*)' [-Wmissing-declarations]
llama.cpp>   366 | void set_param_model(struct my_llama_model * model) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:392:6: warning: no previous declaration for 'void randomize_model(my_llama_model*, int, float, float, float, float)' [-Wmissing-declarations]
llama.cpp>   392 | void randomize_model(struct my_llama_model * model, int seed, float mean, float std, float min, float max) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:421:6: warning: no previous declaration for 'void assert_shape_1d(ggml_tensor*, int64_t)' [-Wmissing-declarations]
llama.cpp>   421 | void assert_shape_1d(struct ggml_tensor * tensor, int64_t ne0) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:426:6: warning: no previous declaration for 'void assert_shape_2d(ggml_tensor*, int64_t, int64_t)' [-Wmissing-declarations]
llama.cpp>   426 | void assert_shape_2d(struct ggml_tensor * tensor, int64_t ne0, int64_t ne1) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:432:6: warning: no previous declaration for 'void assert_shape_3d(ggml_tensor*, int64_t, int64_t, int64_t)' [-Wmissing-declarations]
llama.cpp>   432 | void assert_shape_3d(struct ggml_tensor * tensor, int64_t ne0, int64_t ne1, int64_t ne2) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:439:6: warning: no previous declaration for 'void assert_shape_4d(ggml_tensor*, int64_t, int64_t, int64_t, int64_t)' [-Wmissing-declarations]
llama.cpp>   439 | void assert_shape_4d(struct ggml_tensor * tensor, int64_t ne0, int64_t ne1, int64_t ne2, int64_t ne3) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:493:19: warning: no previous declaration for 'hash_map* new_hash_map()' [-Wmissing-declarations]
llama.cpp>   493 | struct hash_map * new_hash_map() {
llama.cpp>       |                   ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:502:6: warning: no previous declaration for 'void free_hash_map(hash_map*)' [-Wmissing-declarations]
llama.cpp>   502 | void free_hash_map(struct hash_map * map) {
llama.cpp>       |      ^~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:533:22: warning: no previous declaration for 'ggml_tensor* ggml_recompute_graph_node(ggml_context*, ggml_cgraph*, hash_map*, ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>   533 | struct ggml_tensor * ggml_recompute_graph_node(
llama.cpp>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:599:6: warning: no previous declaration for 'void ggml_build_backward_gradient_checkpointing(ggml_context*, ggml_cgraph*, ggml_cgraph*, ggml_cgraph*, ggml_tensor**, int)' [-Wmissing-declarations]
llama.cpp>   599 | void ggml_build_backward_gradient_checkpointing(
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:645:22: warning: no previous declaration for 'ggml_tensor* llama_build_train_graphs(my_llama_model*, ggml_allocr*, ggml_context*, ggml_cgraph*, ggml_cgraph*, ggml_cgraph*, ggml_tensor**, ggml_tensor*, ggml_tensor*, int, int, bool, bool)' [-Wmissing-declarations]
llama.cpp>   645 | struct ggml_tensor * llama_build_train_graphs(
llama.cpp>       |                      ^~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:829:6: warning: no previous declaration for 'void set_f32_3d(ggml_tensor*, int64_t, int64_t, int64_t, float)' [-Wmissing-declarations]
llama.cpp>   829 | void set_f32_3d(struct ggml_tensor * tensor, int64_t i0, int64_t i1, int64_t i2, float value) {
llama.cpp>       |      ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:834:6: warning: no previous declaration for 'void set_f32_2d(ggml_tensor*, int64_t, int64_t, float)' [-Wmissing-declarations]
llama.cpp>   834 | void set_f32_2d(struct ggml_tensor * tensor, int64_t i0, int64_t i1, float value) {
llama.cpp>       |      ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:839:6: warning: no previous declaration for 'void set_i32_2d(ggml_tensor*, int64_t, int64_t, int32_t)' [-Wmissing-declarations]
llama.cpp>   839 | void set_i32_2d(struct ggml_tensor * tensor, int64_t i0, int64_t i1, int32_t value) {
llama.cpp>       |      ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:844:7: warning: no previous declaration for 'float get_f32_2d(ggml_tensor*, int64_t, int64_t)' [-Wmissing-declarations]
llama.cpp>   844 | float get_f32_2d(struct ggml_tensor * tensor, int64_t i0, int64_t i1) {
llama.cpp>       |       ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:849:9: warning: no previous declaration for 'int32_t get_i32_2d(ggml_tensor*, int64_t, int64_t)' [-Wmissing-declarations]
llama.cpp>   849 | int32_t get_i32_2d(struct ggml_tensor * tensor, int64_t i0, int64_t i1) {
llama.cpp>       |         ^~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:854:6: warning: no previous declaration for 'void print_row(ggml_tensor*, int)' [-Wmissing-declarations]
llama.cpp>   854 | void print_row(struct ggml_tensor * probs, int i) {
llama.cpp>       |      ^~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:862:6: warning: no previous declaration for 'void print_matrix(ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>   862 | void print_matrix(struct ggml_tensor * probs) {
llama.cpp>       |      ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:873:6: warning: no previous declaration for 'void get_example_targets(llama_context*, const int*, size_t, const llama_token*, size_t, int, ggml_tensor*, ggml_tensor*, ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>   873 | void get_example_targets(struct llama_context * lctx, const int * train_samples, size_t n_train_samples, const llama_token * train_data, size_t n_train_data, int example_id, struct ggml_tensor * tokens_input, struct ggml_tensor * target_logits, struct ggml_tensor * target_probs) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:893:6: warning: no previous declaration for 'void get_example_targets_batch(llama_context*, const int*, size_t, const llama_token*, size_t, int, ggml_tensor*, ggml_tensor*, ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>   893 | void get_example_targets_batch(struct llama_context * lctx, const int * train_samples, size_t n_train_samples, const llama_token * train_data, size_t n_train_data, int example_id, struct ggml_tensor * tokens_input, struct ggml_tensor * target_logits, struct ggml_tensor * target_probs) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:928:5: warning: no previous declaration for 'int tokenize_file(llama_context*, const char*, std::vector<int>&)' [-Wmissing-declarations]
llama.cpp>   928 | int tokenize_file(struct llama_context * lctx, const char * filename, std::vector<llama_token>& out) {
llama.cpp>       |     ^~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:999:6: warning: no previous declaration for 'void shuffle_ints(int*, int*)' [-Wmissing-declarations]
llama.cpp>   999 | void shuffle_ints(int * begin, int * end) {
llama.cpp>       |      ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1033:6: warning: no previous declaration for 'bool are_same_layout(ggml_tensor*, ggml_tensor*)' [-Wmissing-declarations]
llama.cpp>  1033 | bool are_same_layout(struct ggml_tensor * a, struct ggml_tensor * b) {
llama.cpp>       |      ^~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1043:6: warning: no previous declaration for 'void read_tensor_by_name(ggml_tensor*, ggml_context*, const char*)' [-Wmissing-declarations]
llama.cpp>  1043 | void read_tensor_by_name(struct ggml_tensor * dst, struct ggml_context * ctx, const char * name) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1056:6: warning: no previous declaration for 'void load_opt_context_gguf(gguf_context*, ggml_context*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1056 | void load_opt_context_gguf(struct gguf_context * fctx, struct ggml_context * f_ggml_ctx, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1117:6: warning: no previous declaration for 'void save_opt_context_gguf(gguf_context*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1117 | void save_opt_context_gguf(struct gguf_context * fctx, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1184:6: warning: no previous declaration for 'void load_llama_model_gguf(gguf_context*, ggml_context*, my_llama_model*)' [-Wmissing-declarations]
llama.cpp>  1184 | void load_llama_model_gguf(struct gguf_context * fctx, struct ggml_context * f_ggml_ctx, struct my_llama_model * model) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1255:6: warning: no previous declaration for 'void save_llama_model_gguf(gguf_context*, const char*, my_llama_model*)' [-Wmissing-declarations]
llama.cpp>  1255 | void save_llama_model_gguf(struct gguf_context * fctx, const char * fn_vocab_model, struct my_llama_model * model) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1398:6: warning: no previous declaration for 'void save_llama_model_file(const char*, const char*, my_llama_model*)' [-Wmissing-declarations]
llama.cpp>  1398 | void save_llama_model_file(const char * filename, const char * fn_vocab_model, struct my_llama_model * model) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1409:6: warning: no previous declaration for 'void load_checkpoint_gguf(gguf_context*, ggml_context*, my_llama_model*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1409 | void load_checkpoint_gguf(struct gguf_context * fctx, struct ggml_context * f_ggml_ctx, struct my_llama_model * model, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1423:6: warning: no previous declaration for 'void save_checkpoint_gguf(gguf_context*, const char*, my_llama_model*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1423 | void save_checkpoint_gguf(struct gguf_context * fctx, const char * fn_vocab_model, struct my_llama_model * model, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1434:6: warning: no previous declaration for 'bool load_checkpoint_file(const char*, my_llama_model*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1434 | bool load_checkpoint_file(const char * filename, struct my_llama_model * model, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1449:6: warning: no previous declaration for 'void save_checkpoint_file(const char*, const char*, my_llama_model*, ggml_opt_context*)' [-Wmissing-declarations]
llama.cpp>  1449 | void save_checkpoint_file(const char * filename, const char * fn_vocab_model, struct my_llama_model * model, struct ggml_opt_context * opt) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1460:7: warning: no previous declaration for 'float cosine_decay(int, float, int)' [-Wmissing-declarations]
llama.cpp>  1460 | float cosine_decay(const int decay_steps, const float minimum, int step) {
llama.cpp>       |       ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1469:7: warning: no previous declaration for 'float cosine_decay_restart(int, float, int, float, bool)' [-Wmissing-declarations]
llama.cpp>  1469 | float cosine_decay_restart(int decay_steps, const float minimum, int step, float restart_step_mult, bool enable_restart) {
llama.cpp>       |       ^~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1537:21: warning: no previous declaration for 'train_params get_default_train_params()' [-Wmissing-declarations]
llama.cpp>  1537 | struct train_params get_default_train_params() {
llama.cpp>       |                     ^~~~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1597:6: warning: no previous declaration for 'void train_print_usage(int, char**, const train_params*)' [-Wmissing-declarations]
llama.cpp>  1597 | void train_print_usage(int /*argc*/, char ** argv, const struct train_params * params) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1654:6: warning: no previous declaration for 'bool train_params_parse(int, char**, train_params*)' [-Wmissing-declarations]
llama.cpp>  1654 | bool train_params_parse(int argc, char ** argv, struct train_params * params) {
llama.cpp>       |      ^~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:1948:6: warning: no previous declaration for 'void opt_callback(void*, float*)' [-Wmissing-declarations]
llama.cpp>  1948 | void opt_callback(void * vdata, float * sched) {
llama.cpp>       |      ^~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp: In function 'ggml_tensor* llama_build_train_graphs(my_llama_model*, ggml_allocr*, ggml_context*, ggml_cgraph*, ggml_cgraph*, ggml_cgraph*, ggml_tensor**, ggml_tensor*, ggml_tensor*, int, int, bool, bool)':
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:735:68: warning: 'kv_scale' may be used uninitialized [-Wmaybe-uninitialized]
llama.cpp>   735 |             struct ggml_tensor * t16_1 = ggml_scale_inplace        (ctx, t16_0, kv_scale);          set_name(t16_1, "t16_1"); assert_shape_4d(t16_1, N, N, n_head, n_batch);
llama.cpp>       |                                          ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
llama.cpp> /build/c0bpkra2hwrzkj4rbv8wrdk1sxrg8kwn-source/examples/train-text-from-scratch/train-text-from-scratch.cpp:709:26: note: 'kv_scale' was declared here
llama.cpp>   709 |     struct ggml_tensor * kv_scale;
llama.cpp>       |                          ^~~~~~~~
llama.cpp> [30/70] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
llama.cpp> [31/70] Building CXX object examples/quantize-stats/CMakeFiles/quantize-stats.dir/quantize-stats.cpp.o
llama.cpp> [32/70] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
llama.cpp> [33/70] Building CXX object examples/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
llama.cpp> [34/70] Building C object CMakeFiles/ggml.dir/ggml.c.o
llama.cpp> [35/70] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
llama.cpp> [36/70] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o
llama.cpp> [37/70] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
llama.cpp> [38/70] Building CUDA object CMakeFiles/ggml.dir/ggml-cuda.cu.o
llama.cpp> nvcc warning : incompatible redefinition for option 'compiler-bindir', the last value of this option was used
llama.cpp> [39/70] Linking CUDA static library libggml_static.a
llama.cpp> [40/70] Linking CUDA shared library libggml_shared.so
llama.cpp> [41/70] Linking CXX shared library libllama.so
llama.cpp> [42/70] Linking CXX shared library examples/embd-input/libembdinput.so
llama.cpp> [43/70] Linking CXX executable bin/quantize
llama.cpp> [44/70] Linking CXX executable bin/benchmark
llama.cpp> [45/70] Linking CXX executable bin/test-quantize-perf
llama.cpp> [46/70] Linking CXX executable bin/test-llama-grammar
llama.cpp> [47/70] Linking CXX executable bin/main
llama.cpp> [48/70] Linking CXX executable bin/test-tokenizer-1-llama
llama.cpp> [49/70] Linking CXX executable bin/quantize-stats
llama.cpp> [50/70] Linking CXX executable bin/test-tokenizer-0-llama
llama.cpp> [51/70] Linking CXX executable bin/test-grammar-parser
llama.cpp> [52/70] Linking CXX executable bin/perplexity
llama.cpp> [53/70] Linking CXX executable bin/test-quantize-fns
llama.cpp> [54/70] Linking CXX executable bin/test-sampling
llama.cpp> [55/70] Linking CXX executable bin/test-tokenizer-0-falcon
llama.cpp> [56/70] Linking CXX executable bin/baby-llama
llama.cpp> [57/70] Linking CXX executable bin/simple
llama.cpp> [58/70] Linking CXX executable bin/embedding
llama.cpp> [59/70] Linking CXX executable bin/save-load-state
llama.cpp> [60/70] Linking CXX executable bin/llama-bench
llama.cpp> [61/70] Linking CXX executable bin/convert-llama2c-to-ggml
llama.cpp> [62/70] Linking C executable bin/test-c
llama.cpp> [63/70] Linking CXX executable bin/test-grad0
llama.cpp> [64/70] Linking CXX executable bin/train-text-from-scratch
llama.cpp> [65/70] Linking CXX executable bin/speculative
llama.cpp> [66/70] Linking CXX executable bin/embd-input-test
llama.cpp> [67/70] Linking CXX executable bin/beam-search
llama.cpp> [68/70] Linking CXX executable bin/q8dot
llama.cpp> [69/70] Linking CXX executable bin/vdot
llama.cpp> [70/70] Linking CXX executable bin/server
llama.cpp> buildPhase completed in 40 seconds
llama.cpp> installing
llama.cpp> install flags: -j24 install
llama.cpp> [0/1] Install the project...
llama.cpp> -- Install configuration: "Release"
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libggml_shared.so
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/cmake/Llama/LlamaConfig.cmake
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/cmake/Llama/LlamaConfigVersion.cmake
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include/ggml.h
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include/ggml-cuda.h
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include/ggml-mpi.h
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include/k_quants.h
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libllama.so
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/include/llama.h
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/convert.py
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/convert-lora-to-ggml.py
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-quantize-fns
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-quantize-perf
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-sampling
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-0-llama
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-0-falcon
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-1-llama
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-grammar-parser
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-llama-grammar
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-grad0
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/main
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/quantize
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/quantize-stats
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/perplexity
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/embedding
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/save-load-state
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/benchmark
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/baby-llama
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/train-text-from-scratch
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/convert-llama2c-to-ggml
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/simple
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/speculative
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libembdinput.so
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/embd-input-test
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/llama-bench
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/beam-search
llama.cpp> -- Installing: /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/server
llama.cpp> post-installation fixup
llama.cpp> shrinking RPATHs of ELF executables and libraries in /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-0-falcon
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/baby-llama
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-quantize-perf
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/save-load-state
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/perplexity
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-quantize-fns
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/llama-server
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/benchmark
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/llama-bench
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-grammar-parser
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/embedding
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-0-llama
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/convert-llama2c-to-ggml
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/embd-input-test
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-sampling
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/beam-search
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/train-text-from-scratch
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/quantize-stats
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-grad0
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/simple
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/speculative
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/quantize
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-tokenizer-1-llama
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/test-llama-grammar
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin/llama
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libllama.so
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libggml_shared.so
llama.cpp> shrinking /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib/libembdinput.so
llama.cpp> checking for references to /build/ in /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp...
llama.cpp> patching script interpreter paths in /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp
llama.cpp> stripping (with command strip and flags -S -p) in  /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/lib /nix/store/pgq3jdxa0qyzc833r2phnaci8k2b7kkx-llama.cpp/bin

example run:

$ export NIXPKGS_ALLOW_UNFREE=1
$ nix run --impure 'github:Green-Sky/llama.cpp/nix_flake_add_cuda#cuda' -- -m workspace/llama.cpp/models/llama-2-7b.Q4_K_M.gguf -p "The meaning of life" -ngl 99
........
llm_load_tensors: ggml ctx size =    0,09 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =   70,41 MB (+  256,00 MB per state)
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloading v cache to GPU
llm_load_tensors: offloading k cache to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 4077 MB
..................................................................................................
llama_new_context_with_model: kv self size  =  256,00 MB
llama_new_context_with_model: compute buffer total size =   71,97 MB
llama_new_context_with_model: VRAM scratch buffer: 70,50 MB

system_info: n_threads = 12 / 24 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1,100000, presence_penalty = 0,000000, frequency_penalty = 0,000000, top_k = 40, tfs_z = 1,000000, top_p = 0,950000, typical_p = 1,000000, temp = 0,800000, mirostat = 0, mirostat_lr = 0,100000, mirostat_ent = 5,000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 The meaning of life is to find your gift. You have a natural talent for something, and the world needs that from you.
I am sure you can relate to this statement: there are times when we feel like we have no purpose in life or that nothing matters. As a result, it can lead us to despair and depression. This is why finding your gift is so important; it gives you an identity and allows you to make meaningful contributions to society while also giving yourself something valuable for the future.
However, the biggest problem with this statement is that everyone seems to have their own opinion about what they should do with their time on earth! Some people believe we should work hard toward making money while others think we should focus more on spending quality time with family and friends. Either way, it's important to know your strengths so you can make an informed decision about how best to use them – whether that means starting a business or simply learning new skills through self-education programs such as online courses offered by universities across America!
The meaning of life is to find your gift. It can be anything from painting, music, dancing and even writing. You don

@Green-Sky
Copy link
Collaborator Author

@Tungsten842 pls review :)

@Green-Sky Green-Sky marked this pull request as draft September 16, 2023 23:19
@Green-Sky
Copy link
Collaborator Author

Ok, so for some reason, this is giving me wrong results. only -ngl 0 -nommq works -> only cublas with no offloaded layers.
@JohannesGaessler any idea what might be wrong here?
The scary part is that it fails silently...
The good part is, that nix should be reproducible. so if anyone wants to help, just have nix installed and run it with nix shell --impure '.#cuda' and llama -m ...

@JohannesGaessler
Copy link
Collaborator

My intuition is that you're building the package for the wrong CUDA arch. The __dp4a (per-byte dot product) is used for all implementations of mul_mat_q and mul_mat_vec_q for compute capability 6.1 or higher. On lower compute capabilities the instruction does not exist so either cuBLAS GEMM or dequantize_mul_mat_vec is used. So for lower compute capabilities instead of the __dp4a implementation a dummy implementation that just does assert(false) is used to satisfy the compiler. CUDA allows you to compile for multiple compute capabilities at once (in the llama.cpp CMakeLists.txt it's 5.2, 6.1, and 7.0 by default). At runtime the implementation with the highest compatible compute capability is chosen. If you now however only compile for a compute capability < 6.1 (cmake for example builds for 5.2 by default) and at the same time strip out asserts then GPUs with a compute capability >= 6.1 will silently use the dummy implementation and produce garbage results.

@Green-Sky
Copy link
Collaborator Author

Green-Sky commented Sep 17, 2023

You where right, since Nix aims to be highly reproducible, flags, such as -arch=native, don't actually look at the hardware they are running on, and probably just return the lowest possible value.

I might just use the hacky cmake way they use else where, that would work around it. Or update the Makefile to allow setting the arch...

@JohannesGaessler
Copy link
Collaborator

You where right, since Nix aims to be highly reproducible, flags, such as -arch=native, don't actually look at the hardware they are running on, and probably just return the lowest possible value.

I don't know how Nix packages work but if they distribute precompiled packages the problem is rather that on the machine on which the package is compiled there is no GPU so it compiles for the lowest possible compute capability only. If you compile with -arch=all the binaries should end up containing PTX code for all CUDA architectures regardless of the machine on which it's compiled (at the cost of higher compile time/binary size). Although compiling for the same architectures as in cmake should be enough.

@Green-Sky
Copy link
Collaborator Author

I don't know how Nix packages work but if they distribute precompiled packages the problem is rather that on the machine on which the package is compiled there is no GPU so it compiles for the lowest possible compute capability only.

afaik, the build input (nix closure?) is hashed, and that hash can then be looked up on a cache server, if there is a prebuilt binary, it will pull that.

disclaimer: i just stared nix(os) a week ago, as a result of my ubuntu install dying, so i might be misinformed :)

Part of the "same input, same output" / reproducibility workflow is making compilers platform independent. -arch=native does not enable avx etc for gcc and idk what it actually does for nvcc.

If you compile with -arch=all the binaries should end up containing PTX code for all CUDA architectures regardless of the machine on which it's compiled (at the cost of higher compile time/binary size). Although compiling for the same architectures as in cmake should be enough.

yea, i hacked it by replacing -arch=native with -arch=sm_75. will come up with a better solution before i un-draft the pr.

@cebtenzzre
Copy link
Collaborator

The one time I ever tried NixOS, I ran into NixOS/nixpkgs#38635 and gave up, so I wouldn't easily be able to review this.

@Green-Sky
Copy link
Collaborator Author

Green-Sky commented Sep 24, 2023

Oh damn. Well, you don't need to use nix-env for this. Also, must have been a light system.
In any case, I just copied the cuda segments needed from similar nixpkgs packages, and just need someone to approve this, so I don't have to dirty my checkout everytime I want to run it on the graphics card.

also, looks like I will be the resident nix reviewer in the future...

@Green-Sky Green-Sky merged commit a98b163 into ggerganov:master Sep 25, 2023
10 checks passed
pkrmf pushed a commit to morlockstudios-com/llama.cpp that referenced this pull request Sep 26, 2023
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Sep 27, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp:
  convert : remove bug in convert.py permute function (ggerganov#3364)
  make-ggml.py : compatibility with more models and GGUF (ggerganov#3290)
  gguf : fix a few general keys (ggerganov#3341)
  metal : reusing llama.cpp logging (ggerganov#3152)
  build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (ggerganov#3342)
  readme : add some recent perplexity and bpw measurements to READMES, link for k-quants (ggerganov#3340)
  cmake : fix build-info.h on MSVC (ggerganov#3309)
  docs: Fix typo CLBlast_DIR var. (ggerganov#3330)
  nix : add cuda, use a symlinked toolkit for cmake (ggerganov#3202)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants