Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: DiTFastAttn for PixArt #297

Merged
merged 2 commits into from
Oct 8, 2024
Merged

Conversation

ZDJeffrey
Copy link
Contributor

Summary

DiTFastAttn is an attention compression method for Diffusion Transformer Models. Using the redundancy of DiT. It introduces some compression methods for the self-attention to accelerate the inference speed on a single GPU.

Implementation

  • A new config class and a new argument group are added for the DiTFastAttn in xfuser\config\
  • Following the implementation of Long Context Attention, DiTFastAttn module is implemented in xfuser\core\fast_attention
  • As it can only be used with data parallelism, the attention processor is implemented independently instead of greatly modify the original attention processor in xfuser\model_executor\layers\attention_processor.py.
  • Before the DiTFastAttn works, the compression method need to be set. Thus when the function prepare_run is called, the compression method is set if the DiTFastAttn is enabled.

How to use

  • To use the DiTFastAttn, the following arguments are required:
    • use_fast_attn: enable the fast attention.
    • n_calib: Number of prompts for compression method selection.
    • threshold: Threshold for selecting attention compression method. It relatively determines the compression ratio.
    • window_size: Size of window attention. According to the paper, the window size is recommended to be 1/8 of the token size.
  • When using the DiTFastAttn for a model for the first time, some arguments need to be set for the compression.
    • coco_path: Path of MS COCO annotation json file(e.g. captions_val2014.json from official site). The file contains the captions of the images, which are sampled for the compression.
  • After the compression, the method will be saved in a json file in cache folder. This file can be loaded for the same models with same arguments if use_cache is set.
  • Note: The DiTFastAttn can only be used with data parallelism. If other parallelism methods are used, the program will raise an error.

Test

So far, only the implementation of DiTFastAttn for PixArt models are done. I have tested it with data parallelism on PixArt-alpha/PixArt-Sigma-XL-2-1024-MS and PixArt-alpha/PixArt-XL-2-1024-MS. As the PixArt-alpha/PixArt-Sigma-XL-2-2K-MS model is not available on the Huggingface now, I have not tested it yet.

WIP

Implementation of DiTFastAttn for other models is still in progress.
The benchmark of the DiTFastAttn is not done yet. I will do it after the implementation of DiTFastAttn for other models is done.

@ZDJeffrey
Copy link
Contributor Author

ZDJeffrey commented Oct 8, 2024

Image Compare

PixArt-alpha/PixArt-XL-2-1024-MS

  • origin: epoch time: 3.81 sec, memory: 15.512717312 GB
    • origin1
    • origin2
  • DiTFastAttn(threshold=0.15): epoch time: 3.17 sec, memory: 16.547694592 GB
    • fastattn1
    • fastattn2

Copy link
Collaborator

@Eigensystem Eigensystem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Elegant code!

Copy link
Collaborator

@xibosun xibosun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pr defines FastAttnState, with a similar implementation to the existing state. Most code of DiTFastAttn are written to xfuser/core/fast_attention with minor modification to existing code. Overall, the code is correct and elegant.

@feifeibear feifeibear merged commit ae504d6 into xdit-project:main Oct 8, 2024
3 checks passed
feifeibear pushed a commit to feifeibear/xDiT that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants