Skip to content

Latest commit

 

History

History

demo

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Text Generation Task

To run text generation task in the streaming mode:

python text_generation.py \
    --model 01-ai/Yi-6B \
    --tokenizer 01-ai/Yi-6B \
    --max-tokens 512 \
    --eos-token $'\n' \
    --streaming

You can also provide an extra --prompt argument to try some other prompts.

When dealing with extremely long input sequences, you may need multiple GPU devices and to enable tensor parallelism acceleration during inference to avoid insufficient memory error.

To run text generation task using tensor parallelism acceleration with 2 GPU devices:

torchrun --nproc_per_node 2 \
    text_generation_tp.py \
    --model 01-ai/Yi-6B \
    --max-tokens 512 \
    --eos-token $'\n' \
    --streaming