HLS Implementation of Llama 2

Prerequisites

Follow the setup for AWS FPGA Developer AMI
Open the project in Vitis IDE
Verify the install by building the Software Emulation
If no run configurations exist, add a new System Project Debug configuration with the following user provided arguments ${project_loc:llama_xrt}/src/weights.bin -z ${project_loc:llama_xrt}/src/tokenizer.bin -t 0.8 -n 256 -i "{prompt}" -k

Run the Hardware build, should take around ~12 hours.
Extract the .xclbin file and export to an AWS AMI following the directions here
Launch an EC2 F1 instance and copy the generated .awsxclbin file to the same directory as the host code
Start the FPGA runtime (more instructions here)

cd $AWS_FPGA_REPO_DIR

source vitis_setup.sh

source vitis_runtime_setup.sh
Ensure that devtoolset-9 (or g++-9) is installed and enabled
Build the host executable

g++ -Wall -O3 -std=c++17 src/llama2.cpp -o llama2 -I${XILINX_XRT}/include -L${XILINX_XRT}/lib -lxrt_coreutil -lpthread -lrt -lstdc++
Run the host executable

./llama2 {path to weights} -z {path to tokenizer} -t {temp} -n {steps} -i {prompt} -k {path to kernel}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cpu_benchmarks		cpu_benchmarks
gpu_benchmarks		gpu_benchmarks
llama_xrt		llama_xrt
llama_xrt_kernels		llama_xrt_kernels
llama_xrt_system		llama_xrt_system
llama_xrt_system_hw_link		llama_xrt_system_hw_link
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md