-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorboard logging #208
tensorboard logging #208
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a glance, I don't have anything to add here. But let me know if you would like me to dive in and do a proper review.
One question on the design. Could it make sense to split those new resources in a new class instead of keep expanding AbstractInterface
and Sup3rGan
? It could be sideways or aggregating (such as ProfiledInterface(AbstractInterface)
or ProfiledMixInterface()
). The main benefit would be to isolate the related components so it is easier later to test, inspect, and maintain, but also allow to choose what to run. For instance one might prefer to run in production without the profiling related resources.
@castelao Yeah I considered splitting into another class as well, so that probably means I should. I've included a flag in the |
OptimizerClass = getattr(optimizers, class_name) | ||
sig = signature(OptimizerClass) | ||
optimizer_class = getattr(optimizers, class_name) | ||
sig = signature(optimizer_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used camel case here because the variable is actually a class object even though it doesnt have the actual class name. I know it feels/looks kind of weird but i thought it was appropriate? I could be convinced otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suppose that makes sense. I figured _class
took care of that distinction and ruff yelled at me about it so I shrugged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also a shrug but this is why i don't like auto formatters haha no freedom to think for yourself!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it was just a ruff suggestion tbf lol. I had no strong opinion either way.
msg = ('Could not run layer #{} "{}" on tensor of shape {}'. | ||
format(layer_num, layer, hi_res.shape)) | ||
logger.error(msg) | ||
raise RuntimeError(msg) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why is this your preference for the nesting of the for loop / try wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's bad practice to have try/except inside the loop, and also very slightly worse performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it bad practice? Makes sense about performance but yeah i expect it's minimal given that the tensorflow forward/backward pass operation is expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad practice bc of the micro-optimization I guess. If you can do the same thing with a single call rather than a loop that seems like a better practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha that makes sense. I liked being able to log which layer broke but the iter counter work too 👍
sup3r/models/abstract.py
Outdated
t0 = time.time() | ||
loss, loss_details = loss_out | ||
grad = tape.gradient(loss, training_weights) | ||
self._timing_details['dt:tape.gradient'] = time.time() - t0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these timer variables are kind of gross and not very readable. I tried to look up a better way to do this and I honestly don't see anything great. What about defining a utility fun like this:
def timer(fun, *args, **kwargs):
t0 = time.time()
out = fun(*args, **kwargs)
t_elap = time.time() - t0
return out, t_elap
So you could call something like htis:
hi_res_exo, t1 = timer(self.get_high_res_exo_input, hi_res_true)
self._timing_details['dt:get_high_res_exo_input'] = t1
Not great but do you think there's a better way to do this besides all of the t0 variables? You could even have the function be a self method in a timer class that would log the details to self._timing_details instead of returning t_elap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like the class idea
…n up timing/logging
91cddf7
to
2ec1d2e
Compare
2ec1d2e
to
e2d62ae
Compare
out = fun(*args, **kwargs) | ||
t_elap = time.time() - t0 | ||
self.log[f'elapsed:{fun.__name__}'] = t_elap | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you want Timer
to be part of TensorboardMixIn
?
How does the log discern consecutive calls to the function? Just because you write the log to tensorboard every epoch?
I guess i could see how Timer would be useful outside of the tensorboard stuff with additional instance functions. Maybe an auto-logger? You don't need to add now but could be useful in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I figured it was a pretty general method and yeah log is written for each batch actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense 👍
tensorboard logging
adding tensorboard log writing with some additional timing info