Llama 2 on CPU, and Mac M1/M2 GPU

This is a fork of https://github.com/facebookresearch/llama that runs on CPU and Mac M1/M2 GPU (mps) if available.

Please refer to the official installation and usage instructions as they are exactly the same.

MacBook Pro M1 with 7B model:

There is also an extra message shown during text generation that reports the number and speed at which tokens are being generated.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
llama		llama
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
Responsible-Use-Guide.pdf		Responsible-Use-Guide.pdf
USE_POLICY.md		USE_POLICY.md
download.sh		download.sh
example_chat_completion.py		example_chat_completion.py
example_text_completion.py		example_text_completion.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback