Ollama: only sometimes running on GPU #319303

MatthiasvB · 2024-06-12T11:34:09Z

Describe the bug

I have installed ollama with the option services.ollama.acceleration = "cuda";. This caused the package to be built on my system, as opposed to being downloaded from a binary cache. After this, it ran very fast, as expected.

After the next reboot, it regressed to running slowly, utilizing the CPU.

After the next reboot, it ran fast again.

After putting the PC to sleep for lunch break, it runs slowly on CPU again.

Steps To Reproduce

Steps to reproduce the behavior:
It's flaky. Sometimes it uses the GPU, sometimes not

Expected behavior

It should always run on the GPU

Screenshots

N/A

Additional context

NVIDIA RTX 4070, configured for offload mode

Notify maintainers

@abysssol @onny @marcusramberg

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.32, NixOS, 24.05 (Uakari), 24.05.675.805a384895c6`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"home-manager-24.05.tar.gz, nixos-24.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

Add a 👍 reaction to issues you find important.

The text was updated successfully, but these errors were encountered:

onny · 2024-06-12T14:53:49Z

Whats your Ollama version you're currently using?

MatthiasvB · 2024-06-12T17:48:56Z

Whats your Ollama version you're currently using?

The one packaged in nixos-24.05, that's 0.1.38

abysssol · 2024-06-13T07:28:16Z

The nature of this problem makes me think it could be a bug in upstream ollama; I'm not sure if this has anything to do with nix or not. I think it could be a good idea to create an issue on ollama's repo.

If you want, you could try building the unstable version of ollama, which is currently 0.1.42. Maybe that won't have this issue? If so, it might be time for me to backport the more recent version of ollama to stable nixos.

MatthiasvB · 2024-06-13T09:22:12Z

The nature of this problem makes me think it could be a bug in upstream ollama; I'm not sure if this has anything to do with nix or not. I think it could be a good idea to create an issue on ollama's repo.

If you want, you could try building the unstable version of ollama, which is currently 0.1.42. Maybe that won't have this issue? If so, it might be time for me to backport the more recent version of ollama to stable nixos.

I'll give it a try

MatthiasvB · 2024-06-14T07:10:47Z

I've installed 0.1.42 and it has the same issue, plus it often starts to produce garbled output or even crashes, requiring to restart the ollama service. So I'm back to 0.1.38.

Some research yielded the command sudo rmmod nvidia_uvm && sudo modprobe nvidia_uvm from here as a possible fix for GPU discovery. On first sight, it seems to work

abysssol · 2024-06-14T07:42:40Z

The link seems to indicate that this is actually due to an nvidia driver bug, so this is a little outside of what I can meaningfully fix.

However, I could add something like this to ollama's service module as a workaround:

 powerManagement.powerUpCommands = lib.mkIf (cfg.acceleration == "cuda")
    "rmmod nvidia_uvm && modprobe nvidia_uvm";

I would appreciate it if you tested it first to confirm that it actually solves your problem, before I add it to nixpkgs.
Just add the following to your configuration.nix or equivalent file and rebuild nixos:

 powerManagement.powerUpCommands = "rmmod nvidia_uvm && modprobe nvidia_uvm";

MatthiasvB · 2024-06-14T07:52:24Z

I'm not sure this is the right place to run this command, as ollama "moves" from GPU to CPU at random times, not per (boot) session

abysssol · 2024-06-14T08:07:15Z

Really? In this issue description you mention the behavior changing after rebooting the machine and after putting it to sleep, which seems to agree with what is mentioned by ollama's documentation.

On linux, after a suspend/resume cycle, sometimes Ollama will fail to discover your NVIDIA GPU, and fallback to running on the CPU.

According to it's documentation, powerManagement.powerUpCommands also runs "when it resumes from suspend or hibernation". It seems to be the best place I can find to reload the kernel module.

MatthiasvB · 2024-06-14T08:39:09Z

Yes, agreed, if that's how powerManagement.powerUpCommands works that would seem to be the right place. Unfortunately, ollama just moved itself to CPU just after a longer pause, without reboot or suspend. It seems to just happen randomly all the time

abysssol · 2024-06-14T09:02:28Z

Ah, that's unfortunate. Then it seems you'll simply have to run that command whenever cuda stops working. Ollama's recent release 0.1.44 mentions something seemingly relevant, though.

Fixed certain cases where Nvidia GPUs would not be detected and reported as compute capability 1.0 devices

I wonder if that might fix your problem? I opened pr #319783, I expect it'll be available in nixpkgs-unstable within the next week. It might be worth testing 0.1.44 to see if it helps. In case you don't know, you can use ollama's nixos service with the unstable package.

services.ollama = {
  enable = true;
  acceleration = "cuda";
  # `unstable` will have to be created from the `nixpkgs-unstable` channel or flake input
  package = unstable.ollama;
};

If you do end up testing 0.1.44 and it doesn't help, you probably should open an issue on ollama's repo, unless there's already a relevant tracking issue.

But, since this seems to be a known bug in ollama (or the nvidia drivers), I feel that this issue should be closed, as there's not much I can do to help; I just maintain the nixos package of ollama, not ollama itself. Feel free to continue commenting, I'll help if I can, but I feel like I can't do anything for you to fix this problem.

MatthiasvB added the 0.kind: bug label Jun 12, 2024

abysssol closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama: only sometimes running on GPU #319303

Ollama: only sometimes running on GPU #319303

MatthiasvB commented Jun 12, 2024 •

edited

Loading

onny commented Jun 12, 2024

MatthiasvB commented Jun 12, 2024

abysssol commented Jun 13, 2024

MatthiasvB commented Jun 13, 2024

MatthiasvB commented Jun 14, 2024

abysssol commented Jun 14, 2024

MatthiasvB commented Jun 14, 2024 •

edited

Loading

abysssol commented Jun 14, 2024

MatthiasvB commented Jun 14, 2024

abysssol commented Jun 14, 2024

Ollama: only sometimes running on GPU #319303

Ollama: only sometimes running on GPU #319303

Comments

MatthiasvB commented Jun 12, 2024 • edited Loading

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots

Additional context

Notify maintainers

Metadata

onny commented Jun 12, 2024

MatthiasvB commented Jun 12, 2024

abysssol commented Jun 13, 2024

MatthiasvB commented Jun 13, 2024

MatthiasvB commented Jun 14, 2024

abysssol commented Jun 14, 2024

MatthiasvB commented Jun 14, 2024 • edited Loading

abysssol commented Jun 14, 2024

MatthiasvB commented Jun 14, 2024

abysssol commented Jun 14, 2024

MatthiasvB commented Jun 12, 2024 •

edited

Loading

MatthiasvB commented Jun 14, 2024 •

edited

Loading