Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama: only sometimes running on GPU #319303

Closed
MatthiasvB opened this issue Jun 12, 2024 · 10 comments
Closed

Ollama: only sometimes running on GPU #319303

MatthiasvB opened this issue Jun 12, 2024 · 10 comments

Comments

@MatthiasvB
Copy link

MatthiasvB commented Jun 12, 2024

Describe the bug

I have installed ollama with the option services.ollama.acceleration = "cuda";. This caused the package to be built on my system, as opposed to being downloaded from a binary cache. After this, it ran very fast, as expected.

After the next reboot, it regressed to running slowly, utilizing the CPU.

After the next reboot, it ran fast again.

After putting the PC to sleep for lunch break, it runs slowly on CPU again.

Steps To Reproduce

Steps to reproduce the behavior:
It's flaky. Sometimes it uses the GPU, sometimes not

Expected behavior

It should always run on the GPU

Screenshots

N/A

Additional context

NVIDIA RTX 4070, configured for offload mode

Notify maintainers

@abysssol @onny @marcusramberg

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.32, NixOS, 24.05 (Uakari), 24.05.675.805a384895c6`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"home-manager-24.05.tar.gz, nixos-24.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a 👍 reaction to issues you find important.

@onny
Copy link
Contributor

onny commented Jun 12, 2024

Whats your Ollama version you're currently using?

@MatthiasvB
Copy link
Author

Whats your Ollama version you're currently using?

The one packaged in nixos-24.05, that's 0.1.38

@abysssol
Copy link
Contributor

The nature of this problem makes me think it could be a bug in upstream ollama; I'm not sure if this has anything to do with nix or not. I think it could be a good idea to create an issue on ollama's repo.

If you want, you could try building the unstable version of ollama, which is currently 0.1.42. Maybe that won't have this issue? If so, it might be time for me to backport the more recent version of ollama to stable nixos.

@MatthiasvB
Copy link
Author

The nature of this problem makes me think it could be a bug in upstream ollama; I'm not sure if this has anything to do with nix or not. I think it could be a good idea to create an issue on ollama's repo.

If you want, you could try building the unstable version of ollama, which is currently 0.1.42. Maybe that won't have this issue? If so, it might be time for me to backport the more recent version of ollama to stable nixos.

I'll give it a try

@MatthiasvB
Copy link
Author

I've installed 0.1.42 and it has the same issue, plus it often starts to produce garbled output or even crashes, requiring to restart the ollama service. So I'm back to 0.1.38.

Some research yielded the command sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm from here as a possible fix for GPU discovery. On first sight, it seems to work

@abysssol
Copy link
Contributor

The link seems to indicate that this is actually due to an nvidia driver bug, so this is a little outside of what I can meaningfully fix.

However, I could add something like this to ollama's service module as a workaround:

 powerManagement.powerUpCommands = lib.mkIf (cfg.acceleration == "cuda")
    "rmmod nvidia_uvm && modprobe nvidia_uvm";

I would appreciate it if you tested it first to confirm that it actually solves your problem, before I add it to nixpkgs.
Just add the following to your configuration.nix or equivalent file and rebuild nixos:

 powerManagement.powerUpCommands = "rmmod nvidia_uvm && modprobe nvidia_uvm";

@MatthiasvB
Copy link
Author

MatthiasvB commented Jun 14, 2024

I'm not sure this is the right place to run this command, as ollama "moves" from GPU to CPU at random times, not per (boot) session

@abysssol
Copy link
Contributor

Really? In this issue description you mention the behavior changing after rebooting the machine and after putting it to sleep, which seems to agree with what is mentioned by ollama's documentation.

On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover your NVIDIA GPU, and fallback to running on the CPU.

According to it's documentation, powerManagement.powerUpCommands also runs "when it resumes from suspend or hibernation". It seems to be the best place I can find to reload the kernel module.

@MatthiasvB
Copy link
Author

Yes, agreed, if that's how powerManagement.powerUpCommands works that would seem to be the right place. Unfortunately, ollama just moved itself to CPU just after a longer pause, without reboot or suspend. It seems to just happen randomly all the time

@abysssol
Copy link
Contributor

Ah, that's unfortunate. Then it seems you'll simply have to run that command whenever cuda stops working. Ollama's recent release 0.1.44 mentions something seemingly relevant, though.

Fixed certain cases where Nvidia GPUs would not be detected and reported as compute capability 1.0 devices

I wonder if that might fix your problem? I opened pr #319783, I expect it'll be available in nixpkgs-unstable within the next week. It might be worth testing 0.1.44 to see if it helps. In case you don't know, you can use ollama's nixos service with the unstable package.

services.ollama = {
  enable = true;
  acceleration = "cuda";
  # `unstable` will have to be created from the `nixpkgs-unstable` channel or flake input
  package = unstable.ollama;
};

If you do end up testing 0.1.44 and it doesn't help, you probably should open an issue on ollama's repo, unless there's already a relevant tracking issue.

But, since this seems to be a known bug in ollama (or the nvidia drivers), I feel that this issue should be closed, as there's not much I can do to help; I just maintain the nixos package of ollama, not ollama itself. Feel free to continue commenting, I'll help if I can, but I feel like I can't do anything for you to fix this problem.

@abysssol abysssol closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants