Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference]Lazy Init Support #5785

Merged
merged 14 commits into from
Jun 27, 2024
Merged

[Inference]Lazy Init Support #5785

merged 14 commits into from
Jun 27, 2024

Conversation

LRY89757
Copy link
Contributor

@LRY89757 LRY89757 commented Jun 6, 2024

This PR support the Lazy Init of Llama2, Llama 3 and Baichuan(partially).

NOTE:for baichuan model, the weight of lm_head loaded by lazy init has some difference with transformers model, which results the output diff with transformers output.

Lazy Init supports both local path and remote path.

Lazy Init could accelerate the process of model loading.

For Llama 70B, fp32, tensor parallel=4, It only take 3.3 min to load the model. Without Lazy init, program will freeze due to limited CPU Memory.

@LRY89757 LRY89757 requested a review from a team as a code owner June 6, 2024 03:16
@Edenzzzz Edenzzzz enabled auto-merge (squash) June 18, 2024 05:47
@Edenzzzz Edenzzzz disabled auto-merge June 18, 2024 05:47
@LRY89757 LRY89757 closed this Jun 24, 2024
@LRY89757 LRY89757 reopened this Jun 24, 2024
@LRY89757 LRY89757 closed this Jun 25, 2024
@LRY89757 LRY89757 reopened this Jun 25, 2024
@LRY89757 LRY89757 closed this Jun 27, 2024
@LRY89757 LRY89757 reopened this Jun 27, 2024
@LRY89757 LRY89757 merged commit 3c7cda0 into hpcaitech:main Jun 27, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants