Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Proposal] Add NaN and Inf checking to RL models #368

Open
hill-a opened this issue Jun 11, 2019 · 1 comment
Open

[Feature Proposal] Add NaN and Inf checking to RL models #368

hill-a opened this issue Jun 11, 2019 · 1 comment
Labels
enhancement New feature or request

Comments

@hill-a
Copy link
Owner

hill-a commented Jun 11, 2019

As described in #358, It would be relatively simple to add invalid value detection to Stable-Baselines.

Adding CheckNaNTensorFlow to the losses, and use tf.add_check_numerics_ops in the policies, allows invalid value detection and prevents check_numerics from wrapping over evey operation in the graph (tf.add_check_numerics_ops does this by default).

Code example

class CheckNaNTensorFlow:
    def __init__(self):
        self.check_nan_ops = []
        self.old_ops = []

    def __enter__(self):
        # wrap list so we have a copy of the current list, as list are mutable
        self.old_ops = list(tf.get_default_graph().get_operations())  

    def __exit__(self):
        for op in tf.get_default_graph().get_operations():
            if not op in self.old_ops:
                try:
                    self.check_nan_ops.append(tf.check_numerics(op, ""))
                except TypeError:  # things like reshape or conditional operations are skipped
                    pass

with CheckNaNTensorFlow() as tf_nan:

    ...   # define the TF graph you want to check here

sess.run([val_1, val_2] + tf_nan.check_nan_ops, td_map)  # add to run

Additional context
This will probably add some overhead, my guess only as much as an if instruction. On CPU branch prediction should not bother too much of the performance, it might however hurt GPU performance significanlty, this needs verifing.

@hill-a hill-a added the enhancement New feature or request label Jun 11, 2019
@hill-a
Copy link
Owner Author

hill-a commented Jun 11, 2019

PPO2, CPU, forward and back propagation without invalid value checking:

  • 0.036 +- 0.01 ms

with invalid value checking:

  • 0.036 +- 0.01 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant