terminate called after throwing an instance of 'ReadSocketException' #48

unclemusclez · 2024-05-12T03:50:03Z

The nodes connect, but crash after roughly 3 seconds.
Server:

sudo main simple-server --weights-float-type q40 --buffer-float-type q40 --nthreads 4 --model ~/dllama_meta-llama-3-8b_q40.bin --tokenizer ~/dllama-llama3-tokenizer.t --workers 192.168.2.212:9998 192.168.2.213:9998 192.168.2.214:9998 192.168.2.215:9998 192.168.2.216:9998 192.168.2.217:9998 192.168.2.218:9998

💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001

For Each Worker:
sudo main worker --port 9998 --nthreads 4

Listening on 0.0.0.0:9998...
Client connected
terminate called after throwing an instance of 'ReadSocketException'
  what():  std::exception
Aborted

The text was updated successfully, but these errors were encountered:

DifferentialityDevelopment · 2024-05-12T06:12:38Z

What's your setup like?
Im mine I have two machines where both run under WSL so I had to ensure ports were forwarded to the wsl instance and firewall rules were set to allow traffic to the relevant ports.

b4rtaz · 2024-05-12T06:41:34Z

What logs do you see in the root node? It looks like the root node has disconnected by some reason.

unclemusclez · 2024-05-12T14:38:52Z

yes.
the root node discovers all of the worker nodes when run in simple-server mode, however once it connects, about three seconds later they all disconnect/crash, including the host node.

unclemusclez · 2024-05-12T14:39:55Z

note, i am using llama3 for this project, and it says llama2 as the architecture.

DifferentialityDevelopment · 2024-05-12T14:47:32Z

Llama 3 works just fine, so that shouldn't matter.
I reckon it has something to do with your network configuration.

unclemusclez · 2024-05-12T15:02:29Z

this is possible however i didn't see anything triggered on surricata or pgblocker/pfsense. I'm going to look into it, but if it is my router usually it won't be able to connect after it's blocked. I can connect, it just fails after connection.

thanks for the feedback. i'll look further into my configuration.

b4rtaz · 2024-05-12T15:36:06Z

Could you paste logs from the root node?

unclemusclez · 2024-05-12T16:35:38Z

sure, where are they? is there a verbose mode?

b4rtaz · 2024-05-12T16:36:55Z

No, just that what you see in the console.

unclemusclez · 2024-05-12T21:31:25Z

that was it, the host node just stops

b4rtaz · 2024-05-13T21:37:06Z

Could you prove that by posting a screenshot of the terminal?

unclemusclez · 2024-05-13T21:43:22Z

Could you prove that by posting a screenshot of the terminal?

yes. i see this now and i will be working on it tonight within the next hour or so.
i will keep you posted on what errors i am coming across and i'll try to make sure i'm not botching the setup.

DifferentialityDevelopment · 2024-05-13T21:51:02Z

I know the socket client has a 3 second timeout which might explain why it happens after 3 seconds
Very strange though that it disconnects after having connected.

unclemusclez · 2024-05-14T03:09:45Z

same result:
https://i.imgur.com/6Xv4OJi.png
https://i.imgur.com/95EOzbT.png
https://i.imgur.com/GaZxe9Q.png

unclemusclez · 2024-05-14T03:31:51Z

💡 arch: llama2
💡 dim: 4096
💡 hiddenDim: 14336
💡 nLayers: 32
💡 nHeads: 32
💡 nKvHeads: 8
💡 vocabSize: 128256
💡 seqLen: 2048
💡 nSlices: 8
💡 ropeTheta: 500000.0
📄 bosId: 128000
📄 eosId: 128001
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 7168 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 16384 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 8388608 bytes directly in RAM
🚧 Cannot allocate 262144 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 294912 bytes directly in RAM
🚧 Cannot allocate 1179648 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM
🚧 Cannot allocate 4128768 bytes directly in RAM

Can this run on 1GB Ram Devices? RPi 3b is only 1gb

DifferentialityDevelopment · 2024-05-14T06:33:50Z

Did you run it with sudo?

unclemusclez · 2024-05-14T06:44:22Z

Did you run it with sudo? That happens when you don't run it as sudo from what I know.

first time yes. second time no.

DifferentialityDevelopment · 2024-05-14T06:44:48Z

That happens when you don't run it as sudo from what I know.
Also generally speaking it splits the model up between all workers so as you add more workers, the RAM usage will go down, I think the maximum amount of workers depends on the amount of layers in the model.

unclemusclez · 2024-05-14T06:45:13Z

if there is something i can log, i will

DifferentialityDevelopment · 2024-05-14T06:47:00Z

In my case I ran it between two desktop machines, and I run it within my WSL on windows.
Root node has 32GB ram and worker has 16GB ram, can't add more workers right now as I only have 1 laptop to add to the workers and not 2.

I want to get some of these to add as workers, they have 16GB DDR5 ram each
https://www.aliexpress.com/item/1005006802385272.html

But you must run it with sudo, hence why on the readme it has the command:
sudo nice -n 20 ./main etc

unclemusclez · 2024-05-14T06:48:37Z

is nice mandatory?

DifferentialityDevelopment · 2024-05-14T06:51:56Z

No just the sudo, nice is for lowering the processor affinity for the process so it runs at a lower priority.
Ideally it wouldn't need to run as sudo, right now it's required though.

b4rtaz · 2024-05-14T07:13:55Z

You can run without the nice tool, but you'll get Cannot allocate... warnings on Raspberry Pi (I observed it on RasPi 4 and 5). The nice tool somehow changes the memory allocation policy. Also I observed some transfer improvement with nice, but today I'm not sure about that. I didn't prove that with any test.

Can this run on 1GB Ram Devices? RPi 3b is only 1gb

This may be the root cause.

This is how looks RAM usage with 8 nodes on my mac (Llama 3 8B Q40). Now the root node (0.5.0) keeps the first layer and the last layer in the memory extra.

b4rtaz · 2024-05-14T07:15:39Z

The first layer may be not loaded to RAM (as I did here, check "path" link). The last layer probably may be splitted, so there is a space for improvements.

unclemusclez · 2024-05-14T07:19:34Z

Excellent. I don't mind helping out with this.

My current drive in life is buying every 7900 XTX and/or RPi that people are reselling because they are useless. I love that you're looking into the GPU side of things as well.

Perhaps I can work on converting some models? i understand this isn't a priority but if there is someway I could help I'd like to look into some smaller models that are digestible by the RPi
https://ollama.com/library/llama3-chatqa:8b / https://huggingface.co/nvidia/Llama3-ChatQA-1.5-70B

LaeMat · 2024-05-29T17:46:56Z

Hello, I'm facing the same error but it happens immediately. No message about connecting, all worker nodes crash, the three worker nodes throw the socket error, and the main node throws the STD Exception. I'm running these on Linux Mint 21.3 XFCE across 4 computers. All three are connected over a D-Link network switch with static IPs for each device.

b4rtaz · 2024-05-29T18:06:48Z

Hello @LaeMat! What model are you trying to run? How much RAM do you have?

LaeMat · 2024-05-29T18:11:26Z

Hello @LaeMat! What model are you trying to run? How much RAM do you have?

I'm trying to run LLAMA 3. The root node has 12GB RAM while the worker nodes have 8 or 4GB of RAM.

b4rtaz · 2024-05-29T18:13:38Z

Could you paste logs from all machines (maybe there will be some hint). Also you can try to run a small model (for example TinyLlama).

LaeMat · 2024-05-31T16:42:20Z

Could you paste logs from all machines (maybe there will be some hint). Also you can try to run a small model (for example TinyLlama).

I fixed the issue. Must've been that I was not linking to the right files for the LLM? Currently it works for inference but chat with the LLAMA 3 8B Instruct model seems to be far slower than inference? I'm not sure if that's normal or not.

LaeMat · 2024-05-31T17:23:23Z

However, today the performance has gotten a LOT worse? I'm using the same exact files and prompts as I did for my last tests yesterday, but the performance has gone from over 2 tokens a second to 0.1? has anyone else faced this? I'll add images soon.

LaeMat · 2024-05-31T17:31:53Z

However, today the performance has gotten a LOT worse? I'm using the same exact files and prompts as I did for my last tests yesterday, but the performance has gone from over 2 tokens a second to 0.1? has anyone else faced this? I'll add images soon.

64-Token Eiffel Run.txt
Broken Run.txt

b4rtaz · 2024-05-31T17:41:39Z

@LaeMat have you chosen correctly the --nthreads argument in the second machine? The value cannot be higher than the amout of cores.

LaeMat · 2024-05-31T17:46:51Z

@LaeMat have you chosen correctly the --nthreads argument in the second machine? The value cannot be higher than the amout of cores.

All have 4 threads, so i set nthreads to 4 on all, just like I did yesterday. Again, both runs are on the same computers.

b4rtaz · 2024-05-31T18:01:22Z

Are you using the same version? I suppose something has changed.

terminate called after throwing an instance of 'ReadSocketException' #48

terminate called after throwing an instance of 'ReadSocketException' #48

Comments

unclemusclez commented May 12, 2024 • edited Loading

DifferentialityDevelopment commented May 12, 2024 • edited Loading

b4rtaz commented May 12, 2024

unclemusclez commented May 12, 2024

unclemusclez commented May 12, 2024 • edited Loading

DifferentialityDevelopment commented May 12, 2024 • edited Loading

unclemusclez commented May 12, 2024 • edited Loading

b4rtaz commented May 12, 2024

unclemusclez commented May 12, 2024 • edited Loading

b4rtaz commented May 12, 2024

unclemusclez commented May 12, 2024

b4rtaz commented May 13, 2024

unclemusclez commented May 13, 2024

DifferentialityDevelopment commented May 13, 2024

unclemusclez commented May 14, 2024

unclemusclez commented May 14, 2024

DifferentialityDevelopment commented May 14, 2024 • edited Loading

unclemusclez commented May 14, 2024

DifferentialityDevelopment commented May 14, 2024

unclemusclez commented May 14, 2024

DifferentialityDevelopment commented May 14, 2024 • edited Loading

unclemusclez commented May 14, 2024 • edited Loading

DifferentialityDevelopment commented May 14, 2024 • edited Loading

b4rtaz commented May 14, 2024

b4rtaz commented May 14, 2024

unclemusclez commented May 14, 2024

LaeMat commented May 29, 2024 • edited Loading

b4rtaz commented May 29, 2024 • edited Loading

LaeMat commented May 29, 2024

b4rtaz commented May 29, 2024

LaeMat commented May 31, 2024 • edited Loading

LaeMat commented May 31, 2024

LaeMat commented May 31, 2024

b4rtaz commented May 31, 2024 • edited Loading

LaeMat commented May 31, 2024 • edited Loading

b4rtaz commented May 31, 2024

unclemusclez commented May 12, 2024 •

edited

Loading

DifferentialityDevelopment commented May 12, 2024 •

edited

Loading

unclemusclez commented May 12, 2024 •

edited

Loading

DifferentialityDevelopment commented May 12, 2024 •

edited

Loading

unclemusclez commented May 12, 2024 •

edited

Loading

unclemusclez commented May 12, 2024 •

edited

Loading

DifferentialityDevelopment commented May 14, 2024 •

edited

Loading

DifferentialityDevelopment commented May 14, 2024 •

edited

Loading

unclemusclez commented May 14, 2024 •

edited

Loading

DifferentialityDevelopment commented May 14, 2024 •

edited

Loading

LaeMat commented May 29, 2024 •

edited

Loading

b4rtaz commented May 29, 2024 •

edited

Loading

LaeMat commented May 31, 2024 •

edited

Loading

b4rtaz commented May 31, 2024 •

edited

Loading

LaeMat commented May 31, 2024 •

edited

Loading