The total reward is calculated for each episode, which is continued for 30 games.
We collect average total reward for every 100 consecutive episodes. The threshold is unknown.
The snake's length and reward increase by 1 each time an apple is eaten.
The reward is reduced by 1 every time the snake collides with itself,
bumps into the border of the board, or if it does not eat an apple for a long time.
0 - turn left, 1 - turn right, 2 - move up, 3 - move down
Cartpole, 2 discrete actions are available:
0 - push cart to the left, 1 - push cart to the right.
Navigation, 4 discrete actions are available:
0 - move forward, 1 - move backward, 2 - turn left, 3 - turn right.
LunarLaunder, 4 discrete actions are available:
0 - do nothing, 1 - fire left orientation engine, 2 - fire main engine, 3 - fire right orientation engine.
For any 100 consecutive episodes we get value 'Avg.LenOfSnake' adn 'Max.LenOfSnake'.
- For Learning rate = 1e-4, number of episodes = 50000
Avg.LenOfSnake = 18, Max.LenOfSnake = 46
- For Learning rate = 1e-5, number of episodes = 60000
Avg.LenOfSnake = 18, Max.LenOfSnake = 44
- For Learning rate = 1e-4, number of episodes = 50000
- For Learning rate = 1e-5, number of episodes = 60000
See video Wooden Snake on youtube.
Several parts of the code are based on https://github.com/stefanlclarke/Snake-AI-DQN-