Skip to content

Shared GPU memory working for one GPU, but not two? #2917

Answered by imtuyethan
chlimouj asked this question in Get Help
Discussion options

You must be logged in to vote

You may consider the settings of "n_gl" or "n_gpu_layers" on right panel:

What does this mean?

"n_gl" or "n_gpu_layers" is a setting that controls how many layers of the AI model are loaded into the GPU memory. It's basically a way to balance between speed and memory usage:

  • Higher n_gpu_layers = more GPU memory used, faster processing
  • Lower n_gpu_layers = less GPU memory used, potentially slower processing

Suggestions

  • For 7B models: Try starting with 32 layers (n_gpu_layers=32)
  • For 13B models: Start with around 20-24 layers
  • For larger models: You may need to go even lower, perhaps 16 or fewer layers

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by imtuyethan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants