Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
DiTFastAttn is an attention compression method for Diffusion Transformer Models. Using the redundancy of DiT. It introduces some compression methods for the self-attention to accelerate the inference speed on a single GPU.
Implementation
xfuser\config\
xfuser\core\fast_attention
xfuser\model_executor\layers\attention_processor.py
.prepare_run
is called, the compression method is set if the DiTFastAttn is enabled.How to use
use_fast_attn
: enable the fast attention.n_calib
: Number of prompts for compression method selection.threshold
: Threshold for selecting attention compression method. It relatively determines the compression ratio.window_size
: Size of window attention. According to the paper, the window size is recommended to be 1/8 of the token size.coco_path
: Path of MS COCO annotation json file(e.g.captions_val2014.json
from official site). The file contains the captions of the images, which are sampled for the compression.cache
folder. This file can be loaded for the same models with same arguments ifuse_cache
is set.Test
So far, only the implementation of DiTFastAttn for PixArt models are done. I have tested it with data parallelism on
PixArt-alpha/PixArt-Sigma-XL-2-1024-MS
andPixArt-alpha/PixArt-XL-2-1024-MS
. As thePixArt-alpha/PixArt-Sigma-XL-2-2K-MS
model is not available on the Huggingface now, I have not tested it yet.WIP
Implementation of DiTFastAttn for other models is still in progress.
The benchmark of the DiTFastAttn is not done yet. I will do it after the implementation of DiTFastAttn for other models is done.