-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add third-party malloc library to improve pytorch memory performance on Windows #102534
Comments
This PR is implemention of [#102534](#102534), option 2. Major changes: 1. Add mimalloc to the submodule. 2. Add build option "USE_MIMALLOC". 3. It is only enabled on Windows build, And it would improve pytorch memory allocation performance. Additional Test: <img width="953" alt="image" src="https://github.com/pytorch/pytorch/assets/8433590/4b2ec2dc-16f1-4ad9-b457-cfeb37e489d3"> This PR also build & static link mimalloc on Linux well. Pull Request resolved: #102595 Approved by: https://github.com/jgong5, https://github.com/malfet
@xuhancn, If you are interested in trying https://github.com/microsoft/snmalloc, I would be happy to help. It has CMake and Windows support. |
I will try it soon. Thanks. |
@mjp41 Hi, I wrote a simple project to study snmalloc: https://github.com/xuhancn/research_embedded_snmalloc/blob/main/src/main.cpp#L4
|
🚀 The feature, motivation and pitch
This doc is requesting comments for add third-party malloc library to improve pytorch memory performance on Windows.
During debug the issue: #62387 , We figure out the major performance gap between Windows to Linux is that, Windows has bad memory allocation performance.
I also write a simple malloc benchmark project, bench_malloc. Which can proof the third-party malloc (tc_malloc) can improve the memory alloction performance on Windows.
After that, I tried to evaluate some popular third-party malloc library and make a brief summay here:
From the summary, We can only select two candidate libraries:
Option 1: tc_malloc from gperftools.
Option 2: mimalloc
Alternatives
Option 3: Implement a caching memory allocator for CPU in PyTorch.
Additional context
My proposal
cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ngimel
The text was updated successfully, but these errors were encountered: