Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT-LLM often hangs using both tp_size 2 and enable_context_fmha. #390

Closed
1 of 4 tasks
lkm2835 opened this issue Apr 4, 2024 · 2 comments
Closed
1 of 4 tasks
Labels
bug Something isn't working

Comments

@lkm2835
Copy link

lkm2835 commented Apr 4, 2024

System Info

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python /app/tensorrt_llm/examples/llama/convert_checkpoint.py \
                            --model_dir /app/models \
                            --output_dir /app/models/tensorrt \
                            --dtype float16 \
                            --tp_size 2
trtllm-build --checkpoint_dir /app/models/tensorrt \
             --remove_input_padding enable \
             --gpt_attention_plugin float16 \
             --context_fmha enable \
             --gemm_plugin float16 \
             --output_dir /app/models/tensorrt_llm/context_fmha \
             --paged_kv_cache disable \
             --enable_xqa disable \
             --multi_block_mode disable \
             --tp_size 2 \
             --max_batch_size 1 \
             --max_input_len 4096 \
             --max_output_len 2048
mkdir /app/models/triton_model
cp -r /app/all_models/inflight_batcher_llm/* /app/models/triton_model

python3 /app/tools/fill_template.py -i /app/models/triton_model/preprocessing/config.pbtxt tokenizer_dir:/app/models/,triton_max_batch_size:1,preprocessing_instance_count:1
python3 /app/tools/fill_template.py -i /app/models/triton_model/postprocessing/config.pbtxt tokenizer_dir:/app/models/,triton_max_batch_size:1,postprocessing_instance_count:1
python3 /app/tools/fill_template.py -i /app/models/triton_model/tensorrt_llm_bls/config.pbtxt triton_max_batch_size:1,decoupled_mode:False,bls_instance_count:1,accumulate_tokens:False
python3 /app/tools/fill_template.py -i /app/models/triton_model/ensemble/config.pbtxt triton_max_batch_size:1
python3 /app/tools/fill_template.py -i /app/models/triton_model/tensorrt_llm/config.pbtxt triton_max_batch_size:1,decoupled_mode:False,max_beam_width:1,engine_dir:/app/models/tensorrt_llm/context_fmha,exclude_input_in_output:True,enable_kv_cache_reuse:False,batching_strategy:v1,max_queue_delay_microseconds:0

Expected behavior

It works well without hanging.

actual behavior

+-----------------------------------------+----------------------+----------------------+
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:00:06.0 Off |                    0 |
| N/A   35C    P0              76W / 400W |  11138MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:00:07.0 Off |                    0 |
| N/A   39C    P0              82W / 400W |  11106MiB / 40960MiB |    100%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

TensorRT-LLM often hangs using both tp_size 2 and enable_context_fmha.

additional notes

NA

@lkm2835 lkm2835 added the bug Something isn't working label Apr 4, 2024
@PerkzZheng
Copy link

@lkm2835 do you see this issues when using trt-llm examples directly without triton backends ?

@lkm2835
Copy link
Author

lkm2835 commented Apr 10, 2024

@PerkzZheng I solved it temporarily.
My solution is disable use_custom_all_reduce in trtllm-build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants