Allow clearing gpu cache #14252

vladoovtcharov · 2019-02-25T21:38:26Z

Allows releasing memory from pool from c and python api.
(Should resolve feature request #13482)

szha

Exposing this API as is would cause everything to stop for memory release which could be problematic for an asynchronous execution environment. Because of the impact on performance for not caching memory, a much more preferable option is to provide the option to completely turn off memory caching, so that people can use it and be sure that mxnet is not holding extra memory for special use cases.

szha · 2019-02-25T23:57:41Z

@vladoovtcharov thanks for the contribution. Instead of exposing ReleaseAll, would you mind adding an environment variable for MXNET_USE_GPU_MEM_POOL which defaults to 1, and use this value to decide whether Free should release the memory on the spot or return it to memory pool?

vladoovtcharov · 2019-02-26T00:13:29Z

Hi @szha, thanks for the feedback. I'm not sure if I completely understood though.
If the user wanted the memory to be freed right away couldn't they use a non-pooled storage manager (i.e. storage::NaiveStorageManagerstorage::GPUDeviceStorage()) or did you have something else in mind? (Although currently there is no value for MXNET_GPU_MEM_POOL_TYPE that would do this)

At least for my use case I was hoping to still use the memory pool/caching (for performance) but periodically pause and release the memory (so another process could use the gpu memory). I understand the release_all call could be expensive but it would not/should not be called regularly so overall it seems like it would effect performance less than using non pooled memory.

szha · 2019-02-26T00:48:07Z

@vladoovtcharov yes, they could use GPUDeviceStorage, and exposing the option in MXNET_GPU_MEM_POOL_TYPE is also fine. My general concern around exposing this type of knob at runtime is exactly to avoid supporting this type of use case, as it's hard to make any promise in this setting. For example, releasing objects in memory pool likely won't solve all your problem as there are other levels of caches. For now I think it's better to keep it as simple as on or off.

vladoovtcharov · 2019-02-26T19:58:23Z

I added an additional MXNET_GPU_MEM_POOL_TYPE (Unpooled) but I am worried that it is not the best thing to do.
As expected the performance is pretty bad when using unpooled memory (about 30% slower in my case for a simple classifier as in https://mxnet.incubator.apache.org/versions/master/tutorials/gluon/mnist.html)

It seems like using a pooled memory type with the ability to call ReleaseAll would be preferable in most use cases.

Also I'm not sure why the pooled memory managers would have any difference in what can be promised.
When using a storage manager that doesn't use pooling, when free gets called the memory is freed.
When using a storage manager that does use a pool, when free gets called the memory is recycled to the pool.
When ReleaseAll is called the recycled memory in the pool is freed.
So using the pooled manager with ReleaseAll would have the exact same guarantees as using an unpooled manager, but has the added benefit of improved performance.

szha · 2019-02-26T20:10:05Z

The problem is in the statement "the memory is freed". Not all memories are freed, and users shouldn't be concerned with how caching works in mxnet.

szha · 2019-02-26T20:14:20Z

I'm happy to take this into further consideration in 2.0 design (#9686). For now, let's not add the C API yet.

vladoovtcharov · 2019-02-26T20:22:40Z

@szha ok sounds good

szha · 2019-02-27T05:37:48Z

@vladoovtcharov thanks for understanding. Shall we get the Unpooled option merged first? If so, it would be great if you could look into the CI problems.

vladoovtcharov · 2019-02-27T19:16:14Z

Would it help to split into another pull request for just the unpooling? (One more formatting error to fix...)

szha · 2019-02-27T19:21:46Z

Yes, feel free to do so if it's easier :)

anirudhacharya · 2019-02-27T21:12:27Z

@mxnet-label-bot add [pr-work-in-progress]

pinaraws · 2019-03-20T00:55:41Z

@vladoovtcharov Did you get a chance to work on changes requested by @szha ?

piyushghai · 2019-04-09T00:34:06Z

@vladoovtcharov Gentle ping...

Roshrini · 2019-04-16T23:23:19Z

@vladoovtcharov Can you please resolve conflicts so that we can move forward with this PR?

vladoovtcharov · 2019-04-17T05:02:55Z

I went ahead and split into two merge requests, as requested. The second is here (#14716)

python/mxnet/context.py

proper naming and doc addressed concerns.

vladoovtcharov · 2019-05-08T15:51:21Z

I'll update the c-api naming/documentation as well

… documention

karan6181 · 2019-05-21T21:32:32Z

@vladoovtcharov Thanks for your contribution! Did you get a chance to update the docs ? Thanks!

vladoovtcharov · 2019-05-21T22:33:13Z

@karan6181 yes everything should be checked in and ready to be merged

* Allow releasing all gpu memory * fix white space * stuck ci checks * Fix whitespace * Rename release_all -> empty_cache and provide documentation * fix indentation * Rename c_api's MXStorageReleaseAll -> MXStorageEmptyCache and clarify documention * nudge ci * Update context.py

vladoovtcharov requested review from anirudh2290 and szha as code owners February 25, 2019 21:38

szha previously requested changes Feb 25, 2019

View reviewed changes

vladoovtcharov force-pushed the patch/gpu_release_all branch from cfd530b to ec4eb73 Compare February 27, 2019 17:14

marcoabreu added the pr-work-in-progress PR is still work in progress label Feb 27, 2019

vladoovtcharov force-pushed the patch/gpu_release_all branch from ec4eb73 to 2bc55f2 Compare February 27, 2019 22:32

Allow releasing all gpu memory

b72472b

vladoovtcharov force-pushed the patch/gpu_release_all branch from 2bc55f2 to b72472b Compare April 17, 2019 04:03

vladoovtcharov changed the title ~~Allow releasing all gpu memory~~ Allow clearing gpu cache Apr 17, 2019

fix white space

629a5aa

vlado added 2 commits April 29, 2019 10:31

stuck ci checks

0bc0653

Fix whitespace

f63d1d1

eric-haibin-lin reviewed May 5, 2019

View reviewed changes

python/mxnet/context.py Outdated Show resolved Hide resolved

Rename release_all -> empty_cache and provide documentation

98cbc92

vladoovtcharov force-pushed the patch/gpu_release_all branch from b16a96c to 98cbc92 Compare May 7, 2019 18:38

szha self-requested a review May 7, 2019 20:10

szha reviewed May 7, 2019

View reviewed changes

python/mxnet/context.py Outdated Show resolved Hide resolved

fix indentation

6f24ece

vlado added 2 commits May 8, 2019 10:00

Rename c_api's MXStorageReleaseAll -> MXStorageEmptyCache and clarify…

504f309

… documention

nudge ci

96a7b2b

Update context.py

9eb8370

szha merged commit db2295b into apache:master May 25, 2019

zhreshold mentioned this pull request Dec 10, 2019

[Feature Request] Release gpu memory by API. #13482

Closed

leezu mentioned this pull request May 27, 2021

GPU memory leak after using asnumpy() #20315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow clearing gpu cache #14252

Allow clearing gpu cache #14252

vladoovtcharov commented Feb 25, 2019

szha left a comment

szha commented Feb 25, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 26, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 26, 2019

szha commented Feb 26, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 27, 2019

vladoovtcharov commented Feb 27, 2019

szha commented Feb 27, 2019

anirudhacharya commented Feb 27, 2019

pinaraws commented Mar 20, 2019

piyushghai commented Apr 9, 2019

Roshrini commented Apr 16, 2019

vladoovtcharov commented Apr 17, 2019

vladoovtcharov commented May 8, 2019

karan6181 commented May 21, 2019

vladoovtcharov commented May 21, 2019

Allow clearing gpu cache #14252

Allow clearing gpu cache #14252

Conversation

vladoovtcharov commented Feb 25, 2019

szha left a comment

Choose a reason for hiding this comment

szha commented Feb 25, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 26, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 26, 2019

szha commented Feb 26, 2019

vladoovtcharov commented Feb 26, 2019

szha commented Feb 27, 2019

vladoovtcharov commented Feb 27, 2019

szha commented Feb 27, 2019

anirudhacharya commented Feb 27, 2019

pinaraws commented Mar 20, 2019

piyushghai commented Apr 9, 2019

Roshrini commented Apr 16, 2019

vladoovtcharov commented Apr 17, 2019

vladoovtcharov commented May 8, 2019

karan6181 commented May 21, 2019

vladoovtcharov commented May 21, 2019