You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 25, 2023. It is now read-only.
If the cartpole is already all the way at the right, we can't really select that action. So would it make sense to disallow that from either the random case (by sampling again) or the network case (by choosing the next highest Q value that the network predicts)?
The text was updated successfully, but these errors were encountered:
The episode itself terminates if the cartpole deviates from either side by more than 15 degrees, so the experience is recorded and (hopefully) the agent learns from it.
If the cartpole is already all the way at the right, we can't really select that action. So would it make sense to disallow that from either the random case (by sampling again) or the network case (by choosing the next highest Q value that the network predicts)?
The text was updated successfully, but these errors were encountered: