Replies: 4 comments
-
where is the answer? *it may be -ngl, offloading of some layers it says |
Beta Was this translation helpful? Give feedback.
0 replies
-
I have the same question |
Beta Was this translation helpful? Give feedback.
0 replies
-
@HuangLED were you able to build/run it in hybrid mode? |
Beta Was this translation helpful? Give feedback.
0 replies
-
it is the -ngl N -ngl 0 means everything on cpu Im not sure about where or how it starts using gpu and at what numbers |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Home page says "CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity"
I'd like to learn more about it, any design or code pointer to how it is actually done?
Much appreciated.
Beta Was this translation helpful? Give feedback.
All reactions