Graph is always saved, resulting in large log files #300

dniku · 2019-05-01T15:17:02Z

Apparently, there is no way to prevent the computational graph from being saved. This results in large Tensorboard log files for each training run (~20 Mb for Atari/PPO2/NatureCNN that is terminated almost immediately after launching).

araffin · 2019-05-02T11:15:17Z

Hello,

This results in large Tensorboard log files for each training run

I would say that is an expected feature when using tensorboard logging. And that 20MB is quite small compared to capacity of current harddrives.

is terminated almost immediately after launching

Why would you terminate one run immediately after launching?

dniku · 2019-05-02T13:22:26Z

My use case is currently as follows. I am debugging some code in a Colab GPU session. In doing so, I restart runs very often (to see some debugging info or to check if my changes fix the crash I am trying to debug). Each restart is logged automatically to Google Drive. When I am satisfied with the state of my code, I start full-scale training. When training is complete, I download all logging output to my machine to work with it locally, as analyzing output from a Colab session is inconvenient. However (with my current setup) debugging output is mixed with output from normal training output, so when I download logs, a large fraction of them is useless.

I could of course tweak my current setup to add an option for disabling logging while I am debugging code, but that seems error-prone, as I may easily forget to re-enable it before starting a training run. In any case, an option like save_graph seems useful, since probably you don't actually need to log graphs that often.

araffin · 2019-05-02T13:40:48Z

I could of course tweak my current setup to add an option for disabling logging while I am debugging code, but that seems error-prone, as I may easily forget to re-enable it before starting a training run

defining:

tensorboard_log = None if DEBUG_MODE else 'path/to/log_folder'

with a big warning (printing the current state your training: DEBUG_MODE or not) before starting training does not seem so error prone to me.

Something like:

if DEBUG_MODE:
    print("=" *  20)
    print("WARNING: DEBUG_MODE enabled, no tensorboard log will be saved")
    print("=" *  20)
    # with maybe a time.sleep()


model.learn()

Also, you should keep verbose=1 in both cases.

araffin · 2019-05-02T13:42:34Z

Thinking again about the issue, I'm wondering what does that self.graph contains that needs to be saved. @hill-a ?

EDIT: at first, I thought it was the computation graph but from what i remember, this one is created at the beginning of the run, not the end.

hill-a · 2019-05-02T15:12:32Z

I believe the self.writer.add_graph(self.graph) is used to save the strcture of the training graph, often for debugging purposes

araffin added the enhancement New feature or request label May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph is always saved, resulting in large log files #300

Graph is always saved, resulting in large log files #300

dniku commented May 1, 2019

araffin commented May 2, 2019

dniku commented May 2, 2019

araffin commented May 2, 2019

araffin commented May 2, 2019 •

edited

Loading

hill-a commented May 2, 2019

Graph is always saved, resulting in large log files #300

Graph is always saved, resulting in large log files #300

Comments

dniku commented May 1, 2019

araffin commented May 2, 2019

dniku commented May 2, 2019

araffin commented May 2, 2019

araffin commented May 2, 2019 • edited Loading

hill-a commented May 2, 2019

araffin commented May 2, 2019 •

edited

Loading