Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Learn.py takes all GPU memory #612

Closed
r-lipton opened this issue Apr 11, 2018 · 6 comments
Closed

Learn.py takes all GPU memory #612

r-lipton opened this issue Apr 11, 2018 · 6 comments
Labels
help-wanted Issue contains request for help or information. needs-info Issue contains insufficient information to be resolved.

Comments

@r-lipton
Copy link

Hello,

When I run a training using learn.py, the process allocated all the memory of the GPU.
Is there a way to avoid this, and make it takes only what it needs?

Thanks

@Hengoo
Copy link

Hengoo commented Apr 11, 2018

I dont think the problem is with tensorflow or the ml-agents (when i start training it uses about 20 mb vram)

You should check if your game is doing what you think it does. If you have some kind of memory leak you have to remember than the game is played 100 times as fast as normal, so that might amplify the problem.

@mmattar
Copy link

mmattar commented Apr 11, 2018

Hi @r-lipton, is this using one of our sample environments or your own? Generally, as @Hengoo and @MarcoMeter pointed out, we haven't noticed this on our environments.

@mmattar mmattar added help-wanted Issue contains request for help or information. needs-info Issue contains insufficient information to be resolved. labels Apr 11, 2018
@Sohojoe
Copy link
Contributor

Sohojoe commented Apr 11, 2018

I have seen this problem with OpenAI.Baselines when invoking a 2nd training run. Setting gpu_options.allow_growth = True fixed it for me

replace trainer_controller.py line 212 with tf.Session(config=config) as sess: with:

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:

Update: I tested this today and was able to run multiple training runs concurrently on a single GPU

@r-lipton
Copy link
Author

It's using my own created environment.
The solution of @Sohojoe worked for me, thanks!

@awjuliani
Copy link
Contributor

Hi all. I've made a PR for this, and it will be added to the v0.5 release. #1192

@lock
Copy link

lock bot commented Jan 3, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help-wanted Issue contains request for help or information. needs-info Issue contains insufficient information to be resolved.
Projects
None yet
Development

No branches or pull requests

5 participants