-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minor bug fix #5058
minor bug fix #5058
Conversation
raymond-yuan
commented
Aug 10, 2018
- fixed entropy sign
- changed entropy implementation for numeric stability
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
I signed it! |
CLAs look good, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
policy = tf.nn.softmax(logits) | ||
entropy = tf.reduce_sum(policy * tf.log(policy + 1e-20), axis=1) | ||
entropy = -tf.nn.softmax_cross_entropy_with_logits_v2(labels=policy, logits=logits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason that the entropy is negative here?
policy_loss *= tf.stop_gradient(advantage) | ||
policy_loss -= 0.01 * entropy | ||
policy_loss = 0.01 * entropy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "policy_loss -= 0.01 * entropy" might be correct, overwriting the policy_loss does not make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally correct. I fixed this before merging.
@qlzh727 I think that negative is there because of the But that didn't look right. This "entropy regularization" is supposed to discourage premature convergence, so it should encourage a high entropy not a low entropy. There seems to be a lot of confusion around this point in other tutorials on the subject, because of the way the paper is written. I think this (subtracting the entropy) is right. I'm merging as is. @raymond-yuan LMK if you disagree. |