Enable NUMA feature for llama_cpp_python #4040

StoyanStAtanasov · 2023-09-23T01:26:34Z

Checklist:

I have read the Contributing guidelines.

This is a POC implementation of this feature. Hopefully it is correct, otherwise it is here to get things started. Also I'm not sure if the feature works because I don't know what to expect if it works.

Trying to fix #3444

oobabooga · 2023-09-23T12:26:57Z

What is NUMA and what does it do?

StoyanStAtanasov · 2023-09-23T14:59:44Z

@oobabooga NUMA (Non uniform memory access) should speedup inference on systems with more CPUs or more complex CPUs where the memory speed is not the same depending on which core is used. As the memory speed is very important for LLMs, I hope this can bring up 10-20% speedup on servers.

Here is the PR implementing NUMA on llama.cpp ggerganov/llama.cpp#1556

Here is extract of the help of llama.cpp

NUMA support

--numa: Attempt optimizations that help on some systems with non-uniform memory access. This currently consists of pinning an equal proportion of the threads to the cores on each NUMA node, and disabling prefetch and readahead for mmap. The latter causes mapped pages to be faulted in on first access instead of all at once, and in combination with pinning threads to NUMA nodes, more of the pages end up on the NUMA node where they are used. Note that if the model is already in the system page cache, for example because of a previous run without this option, this will have little effect unless you drop the page cache first. This can be done by rebooting the system or on Linux by writing '3' to '/proc/sys/vm/drop_caches' as root.

Ph0rk0z · 2023-09-23T19:29:58Z

For me, in simplest terms, it meant that all of the model didn't end up loaded onto one CPU's ram.

oobabooga · 2023-09-24T03:59:28Z

Could you add a checkbox to the UI for this parameter? Just look for all occurences of "mlock" under modules/*.py and add similar entries for numa.

StoyanStAtanasov · 2023-09-24T09:45:00Z

[like] Stoyan Atanasov reacted to your message:

…

________________________________ From: oobabooga ***@***.***> Sent: Sunday, September 24, 2023 3:59:39 AM To: oobabooga/text-generation-webui ***@***.***> Cc: Stoyan Atanasov ***@***.***>; Author ***@***.***> Subject: Re: [oobabooga/text-generation-webui] Enable NUMA feature for llama_cpp_python (PR #4040) Could you add a checkbox to the UI for this parameter? Just look for all occurences of "mlock" under modules/*.py and add similar entries for numa. — Reply to this email directly, view it on GitHub<#4040 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALX5D7TWPWPI6YWZQY2I473X36V2XANCNFSM6AAAAAA5D2LHRY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

using mlock as an example

StoyanStAtanasov · 2023-09-24T14:53:33Z

@oobabooga Added the UI elements following the mlock as an example. Have not tested it, as I have always ran the .zip packages on windows and linux. I have yet to try to run it from source. I guess I better check out the Docker way.

Ph0rk0z · 2023-09-24T15:15:45Z

I'm not sure why you add it as a second init. I think the llama.cpp python package can init itself, numa is just another param since the update. It will whine at you that you didn't disable balancing like: echo 0 > /proc/sys/kernel/numa_balancing and you will know it's working.

According to the commit, only low level API must enable it like you did: abetlen/llama-cpp-python@f4090a0

I had did it like this: Ph0rk0z@bab1491

StoyanStAtanasov · 2023-09-24T15:43:35Z

@Ph0rk0z Hey, I don't know what I am doing :) (I'm a typescript dev)

I thought it had to be set in the initializer. I guess I misunderstood the parameter description:
numa: Enable NUMA support. (NOTE: The initial value of this parameter is used for the remainder of the program as this value is set in llama_backend_init)

So what do we do? Did you manage to test it? Was all OK? If so, then maybe it is better to merge your code 👍Or should I change my code?

Ph0rk0z · 2023-09-24T16:57:46Z

I tested mine a while ago but I thought nobody would want numa or GPU selection here. It works fine as another loading parameter. if you re-initialize the backend I dunno what happens, maybe it works maybe it does something weird.

Most llama loader code is using the high level python API which already does the init you called inside of llama.py

 if not Llama.__backend_initialized:
            if self.verbose:
                llama_cpp.llama_backend_init(numa)
            else:
                with suppress_stdout_stderr():
                    llama_cpp.llama_backend_init(numa)
            Llama.__backend_initialized = True

this numa param is same as n_batch or anything else we already set here when loading. when you run it without numa balancing disabled it will pop out a warning message so you know it's enabled.

StoyanStAtanasov · 2023-09-24T20:27:06Z

I have tested it on windows, the UI works, hopefully NUMA functionality as well.

oobabooga · 2023-09-27T01:05:03Z

Thanks for the PR @stoianchoo

Enable the NUMA feature from llama_cpp_python

e952900

StoyanStAtanasov changed the title ~~Enable NUMA feature fro llama_cpp_python~~ Enable NUMA feature for llama_cpp_python Sep 23, 2023

Added UI elements for NUMA

2c48009

using mlock as an example

stoianchoo added 2 commits September 24, 2023 22:23

Fixed wrong initialization

21bdfa0

Improved NUMA checkbox label and position

9f30aac

oobabooga added 5 commits September 26, 2023 17:55

Minor changes

5288687

Minor change

f021266

Change a label

04f208c

Merge branch 'main' into StoyanStAtanasov-main

bf28919

Fix a regression

7220023

oobabooga merged commit 7e6ff8d into oobabooga:main Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable NUMA feature for llama_cpp_python #4040

Enable NUMA feature for llama_cpp_python #4040

StoyanStAtanasov commented Sep 23, 2023 •

edited

Loading

oobabooga commented Sep 23, 2023

StoyanStAtanasov commented Sep 23, 2023

Ph0rk0z commented Sep 23, 2023

oobabooga commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023 via email

StoyanStAtanasov commented Sep 24, 2023 •

edited

Loading

Ph0rk0z commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023

Ph0rk0z commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023

oobabooga commented Sep 27, 2023

Enable NUMA feature for llama_cpp_python #4040

Enable NUMA feature for llama_cpp_python #4040

Conversation

StoyanStAtanasov commented Sep 23, 2023 • edited Loading

Checklist:

oobabooga commented Sep 23, 2023

StoyanStAtanasov commented Sep 23, 2023

NUMA support

Ph0rk0z commented Sep 23, 2023

oobabooga commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023 via email

StoyanStAtanasov commented Sep 24, 2023 • edited Loading

Ph0rk0z commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023

Ph0rk0z commented Sep 24, 2023

StoyanStAtanasov commented Sep 24, 2023

oobabooga commented Sep 27, 2023

StoyanStAtanasov commented Sep 23, 2023 •

edited

Loading

StoyanStAtanasov commented Sep 24, 2023 •

edited

Loading