Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Official llama.cpp docker/images. #7506

Closed
DirtyKnightForVi opened this issue May 24, 2024 · 9 comments
Closed

Support Official llama.cpp docker/images. #7506

DirtyKnightForVi opened this issue May 24, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@DirtyKnightForVi
Copy link

Is there an official version of llama.cpp available in Docker now? I need to deploy it in a completely offline environment, and non-containerized deployment makes the installation of many compilation environments quite troublesome.

Plz

@DirtyKnightForVi DirtyKnightForVi added the enhancement New feature or request label May 24, 2024
@ngxson
Copy link
Collaborator

ngxson commented May 24, 2024

We don't deploy it to Docker Hub, but we do have Github Registry: https://github.com/ggerganov/llama.cpp/pkgs/container/llama.cpp

@DirtyKnightForVi
Copy link
Author

Thank you very much. However, I would like to enter the container and recompile another branch of the project. According to the current manual and practice, it seems not supported, as I have observed that the container fails to start consistently. Have I missed something?

(base) jiyin@jiyin:/media/jiyin/ResearchSpace$ sudo docker run -v /media/jiyin/ResearchSpace:/models 92ddd0cc4ed1 bash
Unknown command: bash
Available commands: 
  --run (-r): Run a model previously converted into ggml
              ex: -m /models/7B/ggml-model-q4_0.bin -p "Building a website can be done in 10 simple steps:" -n 512
  --convert (-c): Convert a llama model into ggml
              ex: --outtype f16 "/models/7B/" 
  --quantize (-q): Optimize with quantization process ggml
              ex: "/models/7B/ggml-model-f16.bin" "/models/7B/ggml-model-q4_0.bin" 2
  --finetune (-f): Run finetune command to create a lora finetune of the model
              See documentation for finetune for command-line parameters
  --all-in-one (-a): Execute --convert & --quantize
              ex: "/models/" 7B
  --server (-s): Run a model on the server
              ex: -m /models/7B/ggml-model-q4_0.bin -c 2048 -ngl 43 -mg 1 --port 8080

@Galunid
Copy link
Collaborator

Galunid commented May 26, 2024

You can try adding --entrypoint /bin/bash instead of bash if you want to get inside the container.

Just note that in general recompiling stuff inside container is not something that should be done. The changes won't persist. You can build multiple images and use docker tags to differentiate between them.

@DirtyKnightForVi
Copy link
Author

What I actually want to do is to try the changes under this branch, which haven't been merged into the main branch yet. Deploying the project on a completely offline machine is indeed too troublesome.

@Galunid
Copy link
Collaborator

Galunid commented May 27, 2024

Then you need to build the image yourself, before you deploy it. See README.md for more information

@Galunid Galunid closed this as completed May 27, 2024
@DirtyKnightForVi
Copy link
Author

We don't deploy it to Docker Hub, but we do have Github Registry: https://github.com/ggerganov/llama.cpp/pkgs/container/llama.cpp

May I follow up with a question: Do the latest images on this list all include the content from the main branch? Does each suffix correspond to a specific version of the source code?

@ngxson
Copy link
Collaborator

ngxson commented Jun 5, 2024

@DirtyKnightForVi The image tag corresponds to:

For example, docker pull ghcr.io/ggerganov/llama.cpp:full-cuda--b1-2b33896 is built from full-cuda.Dockerfile and commit 2b33896

@DirtyKnightForVi
Copy link
Author

@DirtyKnightForVi The image tag corresponds to:

For example, docker pull ghcr.io/ggerganov/llama.cpp:full-cuda--b1-2b33896 is built from full-cuda.Dockerfile and commit 2b33896

Thank you very much for your patient response.

@DirtyKnightForVi
Copy link
Author

Although I successfully ran the model using the CUDA image, it seems that the model was loaded onto the GPU but inference is being performed on the CPU. Did I miss something?

docker run --gpus all -v /path/llama_test:/models ghcr.io/ggerganov/llama.cpp:full-cuda--b1-5442939 -m /models/xx.gguf -p "hello" --n-gpu-layers 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants