Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Officially Support AMD GPUs #954

Closed
4 tasks done
Quentin-Anthony opened this issue May 26, 2023 · 5 comments
Closed
4 tasks done

Officially Support AMD GPUs #954

Quentin-Anthony opened this issue May 26, 2023 · 5 comments
Assignees
Labels
feature request New feature or request

Comments

@Quentin-Anthony
Copy link
Member

Quentin-Anthony commented May 26, 2023

Currently, AMD GPU support lives experimentally in main...AMD

We should port the kernel guards in microsoft/Megatron-DeepSpeed@b4d4a0e#diff-059209398b62b21e2524b387fc6fd23ded28ec725798322c266ee4246d253670 to gpt-neox and bring this into main

  • Test latest gpt-neox main on AMD GPUs with fused kernels set to false
  • Port fused kernels to AMD hardware and test them on AMD GPUs
  • Add conditional HIP guards so that the same fused kernel code can run on AMD and NVIDIA GPUs without modification
  • Add flash-attn fallbacks because 2.x is not yet supported there
@Quentin-Anthony Quentin-Anthony added the feature request New feature or request label May 26, 2023
@R0n12
Copy link
Contributor

R0n12 commented Oct 18, 2023

Will take a look at this!

@R0n12
Copy link
Contributor

R0n12 commented Jan 15, 2024

  • Flash-Attention-2 detection code
  • All fused-kernels (except fused_rotary_positional_embedding) build passed with MI250X+ROCm5.6.0 without HIP guards
  • fused_rotary_positional_embedding build on AMD GPUs
  • Adding HIP guards to generalize building process for AMD and NVIDIA GPUs
  • Tests on NVIDIA platforms with no modification

Do we still need to detect flash attention 1 at this point since AMD has already ported their version 2? (https://github.com/ROCmSoftwarePlatform/flash-attention)

Branch: https://github.com/R0n12/gpt-neox-fork/tree/lang/amd
Based on: https://github.com/EleutherAI/gpt-neox/tree/AMD

@Quentin-Anthony
Copy link
Member Author

Quentin-Anthony commented Jan 15, 2024

Do we still need to detect flash attention 1 at this point since AMD has already ported their version 2?

No I think we can drop that.

@R0n12
Copy link
Contributor

R0n12 commented Mar 10, 2024

Status Update: Been busy with life things recently, will get a new clean branch pushed out by next week.

@Quentin-Anthony
Copy link
Member Author

Done! Great work @R0n12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants