Would it make sense to restrict the action to what's possible? #23

shamoons · 2019-04-04T15:39:18Z

If the cartpole is already all the way at the right, we can't really select that action. So would it make sense to disallow that from either the random case (by sampling again) or the network case (by choosing the next highest Q value that the network predicts)?

daysofthunder98 · 2020-07-28T17:41:07Z

The episode itself terminates if the cartpole deviates from either side by more than 15 degrees, so the experience is recorded and (hopefully) the agent learns from it.

WorksWellWithOthers · 2020-12-05T00:00:40Z

The action shouldn't be restricted since it's the goal of the agent to learn what action to take for the most reward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would it make sense to restrict the action to what's possible? #23

Would it make sense to restrict the action to what's possible? #23

shamoons commented Apr 4, 2019

daysofthunder98 commented Jul 28, 2020

WorksWellWithOthers commented Dec 5, 2020

Would it make sense to restrict the action to what's possible? #23

Would it make sense to restrict the action to what's possible? #23

Comments

shamoons commented Apr 4, 2019

daysofthunder98 commented Jul 28, 2020

WorksWellWithOthers commented Dec 5, 2020