Skip to content

pengwei-iie/llama_bugs

 
 

Repository files navigation

For Chinese

在预测的过程当中遇到了一些坑,然后在这边记录下来,整个过程大概还可以吧,就是上传模型的服务器上的比较费时间,然后预测的时候可能会由于GPU的显存不是很大,导致会out of memory。

首先第1个问题就是我们下载的那个模型,后缀要是以pth为结尾的,如果下载的是huggingface的bin结尾的,这个模型可能会没有办法去进行适配。

第2个问题就是在预测的时候,如果遇到out of memory怎么办?我查了一下,主要的原因是因为batch size,所以我们只需要把batch size设置给它缩小一些就好,这个跟你在inference的batch size大小有关系。我们在这个项目里面增加了一个新的文件example_small.py,大家直接运行这个文件就可以在一张16g的显卡上面进行预测。

example_small.py

LLaMA

This repository is intended as a minimal, hackable and readable example to load LLaMA (arXiv) models and run inference. In order to download the checkpoints and tokenizer, fill this google form

Setup

In a conda env with pytorch / cuda available, run:

pip install -r requirements.txt

Then in this repository:

pip install -e .

Download

Once your request is approved, you will receive links to download the tokenizer and model files. Edit the download.sh script with the signed url provided in the email to download the model weights and tokenizer.

Inference

The provided example.py can be run on a single or multi-gpu node with torchrun and will output completions for two pre-defined prompts. Using TARGET_FOLDER as defined in download.sh:

torchrun --nproc_per_node MP example.py --ckpt_dir $TARGET_FOLDER/model_size --tokenizer_path $TARGET_FOLDER/tokenizer.model

Different models require different MP values:

Model MP
7B 1
13B 2
33B 4
65B 8

FAQ

Reference

LLaMA: Open and Efficient Foundation Language Models -- https://arxiv.org/abs/2302.13971

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

Model Card

See MODEL_CARD.md

License

See the LICENSE file.

About

Inference code for LLaMA models

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 94.0%
  • Shell 6.0%