tensorboard logging #208

bnb32 · 2024-04-09T15:44:48Z

adding tensorboard log writing with some additional timing info

castelao

In a glance, I don't have anything to add here. But let me know if you would like me to dive in and do a proper review.

One question on the design. Could it make sense to split those new resources in a new class instead of keep expanding AbstractInterface and Sup3rGan? It could be sideways or aggregating (such as ProfiledInterface(AbstractInterface) or ProfiledMixInterface()). The main benefit would be to isolate the related components so it is easier later to test, inspect, and maintain, but also allow to choose what to run. For instance one might prefer to run in production without the profiling related resources.

bnb32 · 2024-04-11T18:49:42Z

@castelao Yeah I considered splitting into another class as well, so that probably means I should. I've included a flag in the train method to turn off profiling for production though.

grantbuster · 2024-04-19T15:49:28Z

sup3r/models/abstract.py

- OptimizerClass = getattr(optimizers, class_name)
- sig = signature(OptimizerClass)
+ optimizer_class = getattr(optimizers, class_name)
+ sig = signature(optimizer_class)


I used camel case here because the variable is actually a class object even though it doesnt have the actual class name. I know it feels/looks kind of weird but i thought it was appropriate? I could be convinced otherwise.

Yeah I suppose that makes sense. I figured _class took care of that distinction and ruff yelled at me about it so I shrugged.

I'm also a shrug but this is why i don't like auto formatters haha no freedom to think for yourself!

Well it was just a ruff suggestion tbf lol. I had no strong opinion either way.

sup3r/models/abstract.py

grantbuster · 2024-04-19T15:52:29Z

sup3r/models/abstract.py

+ msg = ('Could not run layer #{} "{}" on tensor of shape {}'.
+ format(layer_num, layer, hi_res.shape))
+ logger.error(msg)
+ raise RuntimeError(msg) from e


Curious, why is this your preference for the nesting of the for loop / try wrapper?

I think it's bad practice to have try/except inside the loop, and also very slightly worse performance.

Why is it bad practice? Makes sense about performance but yeah i expect it's minimal given that the tensorflow forward/backward pass operation is expensive.

Bad practice bc of the micro-optimization I guess. If you can do the same thing with a single call rather than a loop that seems like a better practice.

Gotcha that makes sense. I liked being able to log which layer broke but the iter counter work too 👍

grantbuster · 2024-04-19T16:03:18Z

sup3r/models/abstract.py

+ t0 = time.time()
+ loss, loss_details = loss_out
+ grad = tape.gradient(loss, training_weights)
+ self._timing_details['dt:tape.gradient'] = time.time() - t0


all of these timer variables are kind of gross and not very readable. I tried to look up a better way to do this and I honestly don't see anything great. What about defining a utility fun like this:

def timer(fun, *args, **kwargs): t0 = time.time() out = fun(*args, **kwargs) t_elap = time.time() - t0 return out, t_elap

So you could call something like htis:

hi_res_exo, t1 = timer(self.get_high_res_exo_input, hi_res_true) self._timing_details['dt:get_high_res_exo_input'] = t1

Not great but do you think there's a better way to do this besides all of the t0 variables? You could even have the function be a self method in a timer class that would log the details to self._timing_details instead of returning t_elap.

Yeah I like the class idea

…n up timing/logging

grantbuster · 2024-04-22T14:46:36Z

sup3r/utilities/utilities.py

+ out = fun(*args, **kwargs)
+ t_elap = time.time() - t0
+ self.log[f'elapsed:{fun.__name__}'] = t_elap
+ return out


Why don't you want Timer to be part of TensorboardMixIn?

How does the log discern consecutive calls to the function? Just because you write the log to tensorboard every epoch?

I guess i could see how Timer would be useful outside of the tensorboard stuff with additional instance functions. Maybe an auto-logger? You don't need to add now but could be useful in the future.

Yeah I figured it was a pretty general method and yeah log is written for each batch actually.

makes sense 👍

tensorboard logging

tensorboard logging

81e3e72

bnb32 requested review from grantbuster and castelao April 9, 2024 15:44

bnb32 added 3 commits April 9, 2024 09:58

linting

1b4778e

test fix with tb logging dtype

59c227b

tensorboard profiling

acf05a4

castelao reviewed Apr 11, 2024

View reviewed changes

grantbuster requested changes Apr 19, 2024

View reviewed changes

split tensorboard methods into mixin class. added timer class to clea…

d7b1e7c

…n up timing/logging

bnb32 force-pushed the bnb/tensorboard_logging branch from 91cddf7 to 2ec1d2e Compare April 19, 2024 18:09

linting

e2d62ae

bnb32 force-pushed the bnb/tensorboard_logging branch from 2ec1d2e to e2d62ae Compare April 19, 2024 18:15

grantbuster approved these changes Apr 22, 2024

View reviewed changes

bnb32 merged commit a171300 into main Apr 22, 2024
8 checks passed

bnb32 deleted the bnb/tensorboard_logging branch April 22, 2024 14:58

github-actions bot pushed a commit that referenced this pull request Apr 22, 2024

Merge pull request #208 from NREL/bnb/tensorboard_logging

e136e86

tensorboard logging

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensorboard logging #208

tensorboard logging #208

bnb32 commented Apr 9, 2024

castelao left a comment

bnb32 commented Apr 11, 2024

grantbuster Apr 19, 2024

bnb32 Apr 19, 2024

grantbuster Apr 22, 2024

bnb32 Apr 22, 2024

grantbuster Apr 19, 2024

bnb32 Apr 19, 2024

grantbuster Apr 22, 2024

bnb32 Apr 22, 2024

grantbuster Apr 22, 2024

grantbuster Apr 19, 2024

bnb32 Apr 19, 2024

grantbuster Apr 22, 2024

bnb32 Apr 22, 2024

grantbuster Apr 22, 2024

tensorboard logging #208

tensorboard logging #208

Conversation

bnb32 commented Apr 9, 2024

castelao left a comment

Choose a reason for hiding this comment

bnb32 commented Apr 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment