Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference] Tensor model parallelism #778

Merged
merged 32 commits into from
Jun 25, 2023
Merged

[Inference] Tensor model parallelism #778

merged 32 commits into from
Jun 25, 2023

Conversation

goliaro
Copy link
Collaborator

@goliaro goliaro commented Jun 19, 2023

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

Before merging:

  • Did you update the flexflow-third-party repo, if modifying any of the Cmake files, the build configs, or the submodules?

@goliaro goliaro marked this pull request as draft June 19, 2023 09:10
@lockshaw lockshaw added the inference Features and fixes related to the inference project. label Jun 23, 2023
@goliaro goliaro marked this pull request as ready for review June 24, 2023 16:53
@jiazhihao
Copy link
Collaborator

@gabrieleoliaro The PR looks great to me overall. Do you think we can merge this into the inference branch?

@@ -127,7 +127,7 @@ Tensor
}
if (bias) {
// q, k, v, o
int dims[1] = {embed_dim * 4};
int dims[1] = {(qProjSize + kProjSize + vProjSize) * num_heads + oProjSize};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Gabriele, just to make sure, do we have any situation that the embed_dim * 4 is not equal to qProjSize + kProjSize + vProjSize) * num_heads + oProjSize?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Megatron-LM's tensor model parallelism, we partition attention heads across GPUs, so num_heads of each GPU
can be smaller than the total number of attention heads

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current model we have, the two quantities are indeed always identical. I changed this just to make it easier to remember the data layout

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@goliaro
Copy link
Collaborator Author

goliaro commented Jun 25, 2023

@gabrieleoliaro The PR looks great to me overall. Do you think we can merge this into the inference branch?

yes! I'll merge it as soon as CI passes!

@goliaro goliaro merged commit 0f3be1f into inference Jun 25, 2023
44 of 45 checks passed
@goliaro goliaro deleted the tensor_parallelism branch June 25, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inference Features and fixes related to the inference project.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants