- EC2 instance (z1d.2xlarge recommended)
- AWS FPGA Developer AMI (install)
- S3 Bucket
- Pretrained model parameters & tokenizer can be found here
- Follow the setup for AWS FPGA Developer AMI
- Open the project in Vitis IDE
- Verify the install by building the Software Emulation
- If no run configurations exist, add a new System Project Debug configuration with the following user provided arguments
${project_loc:llama_xrt}/src/weights.bin -z ${project_loc:llama_xrt}/src/tokenizer.bin -t 0.8 -n 256 -i "{prompt}" -k
-
Run the Hardware build, should take around ~12 hours.
-
Extract the .xclbin file and export to an AWS AMI following the directions here
-
Launch an EC2 F1 instance and copy the generated .awsxclbin file to the same directory as the host code
-
Start the FPGA runtime (more instructions here)
cd $AWS_FPGA_REPO_DIR
source vitis_setup.sh
source vitis_runtime_setup.sh
-
Ensure that devtoolset-9 (or g++-9) is installed and enabled
-
Build the host executable
g++ -Wall -O3 -std=c++17 src/llama2.cpp -o llama2 -I${XILINX_XRT}/include -L${XILINX_XRT}/lib -lxrt_coreutil -lpthread -lrt -lstdc++
-
Run the host executable
./llama2 {path to weights} -z {path to tokenizer} -t {temp} -n {steps} -i {prompt} -k {path to kernel}