What is the relationship between the number of CPU cores and the speedup? #232

kohlerm · 2023-06-06T12:23:21Z

I was doing some tests with regards to how fast I can get with more cores.
I have got a machine with >100 cores,.

I was testing with the starchat model :
starcoder -m ./starchat-alpha-ggml-q4_0.bin -p "implement the ackermann function in python" -t 16 --top_k 0 --top_p 0.95 --temp 0.2

I tried with different number of threads (-t) and it seems the max speed is around 32 threads. with around 310 ms per token.

I can see that it always saturates as many cores as I specify with -t.

Is that expected behavior?

the CPU is a Intel(R) Xeon(R) CPU E7-8880 v3

The text was updated successfully, but these errors were encountered:

LoganDark · 2023-06-07T03:53:49Z

With my RWKV sequence mode implementation, it's actually fastest with only one thread, because ggml's atomic polling or work stealing or whatever is so damn expensive that it becomes a massive bottleneck (with tens of thousands of nodes in the graph). So there's another interesting data point for you.

ggerganov · 2023-06-18T07:47:11Z

There is no simple answer as to what is the optimal number of threads. It depends on your CPU, memory bandwidth and model architecture. At the moment, the best strategy is to find the optimal -t value for your machine by trial and error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the relationship between the number of CPU cores and the speedup? #232

What is the relationship between the number of CPU cores and the speedup? #232

kohlerm commented Jun 6, 2023

LoganDark commented Jun 7, 2023

ggerganov commented Jun 18, 2023

What is the relationship between the number of CPU cores and the speedup? #232

What is the relationship between the number of CPU cores and the speedup? #232

Comments

kohlerm commented Jun 6, 2023

LoganDark commented Jun 7, 2023

ggerganov commented Jun 18, 2023