Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
CoderLSF committed Nov 16, 2023
1 parent d867976 commit be3ccb3
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Fast-LLaMA: A High-Performance Inference Engine
<p align="center"><img width="1022" alt="image" src="https://github.com/CoderLSF/fast-llama/assets/65639063/8c3eefc8-0db0-4cb1-8e78-58acc7cf77e3"></p>
<p align="center"><img width="600" alt="image" src="https://github.com/CoderLSF/fast-llama/assets/65639063/d3d66d72-bf91-4bef-b4e8-468227cfea05"></p>


## Descriptions
fast-llama is a super `HIGH`-performance inference engine for LLMs like LLaMA (**3x** of `llama.cpp`) written in `pure C++`. It can run a **`8-bit`** quantized **`LLaMA2-7B`** model on a cpu with 56 cores in speed of **`~30 tokens / s`**. It outperforms all current open-source inference engines, especially when compared to the renowned llama.cpp, with 2~3 times better inference speed on a CPU.
Expand Down

0 comments on commit be3ccb3

Please sign in to comment.