Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add megablocks dropless MoE #1192

Merged
merged 2 commits into from
May 4, 2024
Merged

Add megablocks dropless MoE #1192

merged 2 commits into from
May 4, 2024

Conversation

yang
Copy link
Contributor

@yang yang commented Mar 22, 2024

This initial version focuses on getting megablocks integrated and working with DS parallelism. It makes megablocks experts work within the existing parallelism, which has the full degrees of freedom including expert, expert-data, and tensor-expert-data parallelism.

Tested on 8xA100 for convergence, expert balancing, and uncovered weight initialization issues (to be fixed later).

Design document and worklog that accompanied this project: https://yaaang.notion.site/gpt-neox-MoE-design-doc-cc8586eb53144a5987b63f510ced021c

In terms of where this fits larger arcs of work, next PRs (don't have permission to submit stacked PRs) are for:

  • improved expert initialization like we discussed
  • adding integration tests around this that automate the verification I was showing earlier around convergence and expert + router gradients
  • making it work with DS pipeline parallelism
  • merging with Colin's code and doing the megablocks code fork

@yang yang force-pushed the mbmoe branch 2 times, most recently from 37f19bd to 1013ddd Compare April 24, 2024 17:58
Copy link
Member

@Quentin-Anthony Quentin-Anthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and working for MoE on my end. No comments.

@Quentin-Anthony Quentin-Anthony merged commit 916c883 into EleutherAI:main May 4, 2024
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants