- Features
- Installation
- How to interact with GNU Backgammon using Python Script?
- Usage
- Backgammon OpenAI Gym Environment
- Bibliography, sources of inspiration, related works
- License
- PyTorch implementation of TD-Gammon [1].
- Test the trained agents against an open source implementation of the Backgammon game, GNU Backgammon.
- Play against a trained agent via web gui
I used Anaconda3
, with Python 3.6.8
(I tested only with the following configurations).
Create the conda environment:
$ conda create --name tdgammon python=3.6
$ source activate tdgammon
(tdgammon) $ git clone https://github.com/dellalibera/td-gammon.git
Install the environment gym-backgammon
:
(tdgammon) $ git clone https://github.com/dellalibera/gym-backgammon.git
(tdgammon) $ cd gym-backgammon
(tdgammon) $ pip install -e .
Install the dependencies pytorch v1.2
:
(tdgammon) $ pip install torch torchvision
(tdgammon) $ pip install tb-nightly
or
(tdgammon) $ cd td-gammon/
(tdgammon) $ pip install -r requirements.txt
If you don't use Anaconda environment, run the following commands:
git clone https://github.com/dellalibera/td-gammon.git
pip3 install -r td-gammon/requirements.txt
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .
If you don't use Anaconda environment, in the commands below replace python
with python3
.
To play against gnubg
, you have to install gnubg
.
NOTE: I installed gnubg
on Ubuntu 18.04
(running on a Virtual Machine), with Python 2.7
(see next section to see how to interact with GNU Backgammon).
sudo apt-get install gnubg
I used an http server
that runs on the Guest machine (Ubuntu), to receive commands and interact with the gnubg
program.
In this way, it's possible to send commands from the Host machine (in my case MacOS
).
The file bridge.py
should be executed on the Guest Machine (the machine where gnubg
is installed).
gnubg -t -p /path/to/bridge.py
It runs the gnubg
with the command-line instead of using the graphical interface (-t
) and evaluates a Python code file and exits (-p
).
For a list of parameters of gnubg
, run gnubg --help
.
The python script bridge.py
creates an http server
, running on localhost:8001
.
If you want to modify the host and the port, change the following line in bridge.py
:
if __name__ == "__main__":
HOST = 'localhost' # <-- YOUR HOST HERE
PORT = 8001 # <-- YOUR PORT HERE
run(host=HOST, port=PORT)
The file td_gammon/gnubg/gnubg_backgammon.py
sends messages/commands to gnubg
and parses the response.
Run python /path/to/main.py --help
for a list of parameters.
To train a neural network with a single layer with 40
hidden units, for 100000
games/episodes and save the model every 10000
, run the following command:
(tdgammon) $ python /path/to/main.py train --save_path ./saved_models/exp1 --save_step 10000 --episodes 100000 --name exp1 --type nn --lr 0.1 --hidden_units 40
Run python /path/to/main.py train --help
for a list of parameters available for training.
To evaluate an already trained models, you have to options: evaluate models to play against each other or evaluate one model against gnubg
.
Run python /path/to/main.py evaluate --help
for a list of parameters available for evaluation.
To evaluate two model to play against each other you have to specify the path where the models are saved with the corresponding number of hidden units.
(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --hidden_units_agent1 40 --type nn --model_agent0 path/to/saved_models/agent0.tar --model_agent1 path/to/saved_models/agent1.tar
To evaluate one model to play against gnubg
, first you have to run gnubg
with the script bridge
as input.
On Ubuntu (or where gnubg
is installed)
gnubg -t -p /path/to/bridge.py
Then run (to play vs gnubg
at intermediate level for 100 games):
(tdgammon) $ python /path/to/main.py evaluate --episodes 50 --hidden_units_agent0 40 --type nn --model_agent0 path/to/saved_models/agent0.tar vs_gnubg --difficulty beginner --host GNUBG_HOST --port GNUBG_PORT
The hidden units (--hidden_units_agent0
) of the model must be same of the loaded model (--model_agent0
).
You can play against a trained agent via a web gui:
(tdgammon) $ python /path/to/main.py gui --host localhost --port 8002 --model path/to/saved_models/agent0.tar --hidden_units 40 --type nn
Then navigate to https://localhost:8002
in your browser:
Run python /path/to/main.py gui --help
for a list of parameters available about the web gui.
Instead of evaluating the agent during training (it can require some time especially if you evaluate against gnubg
- difficulty world_class
), you can load all the saved models in a folder, and evaluate each model (saved at different time during training) against one or more opponents.
The models in the directory should be of the same type (i.e the structure of the network should be the same for all the models in the same folder).
To plot the wins against gnubg
, run on Ubuntu (or where gnubg
is installed):
gnubg -t -p /path/to/bridge.py
In the example below the trained model is going to be evaluated against gnubg
on two different difficulties levels - beginner
and advanced
:`
(tdgammon) $ python /path/to/main.py plot --save_path /path/to/saved_models/myexp --hidden_units 40 --episodes 10 --opponent random,gnubg --dst /path/to/experiments --type nn --difficulty beginner,advanced --host GNUBG_HOST --port GNUBG_PORT
To visualize the plots:
(tdgammon) $ tensorboard --logdir=runs/path/to/experiment/ --host localhost --port 8001
Run python /path/to/main.py plot --help
for a list of parameters available about plotting.
For a detailed description of the environment: gym-backgammon
.
- TD-Gammon and Temporal Difference Learning:
- [1] Practical Issues in Temporal Difference Learning
- Temporal Difference Learning and TD-Gammon
- Programming backgammon using self-teaching neural nets
- Implementaion Details TD-Gammon
- Chapter 9 Temporal-Difference Learning
- Implementation Details of the TD(λ) Procedure for the Case of Vector Predictions and Backpropagation
- Learning to Predict by the Methods of Temporal Differences
- GNU Backgammon: https://www.gnu.org/software/gnubg/
- Rules of Backgammon:
- Install GNU Backgammon on Ubuntu:
- How to use python to interact with
gnubg
: [Bug-gnubg] Documentation: Looking for documentation on python scripting - Other Implementation of the Backgammon OpenAI Gym Environment:
- Other Implementation of TD-Gammon:
- How to setup your VMWare Fusion images to use static IP addresses on Mac OS X
- PyTorch Tensorboard: https://pytorch.org/docs/stable/tensorboard.html
exp_20221230_1048_41_259706_100000.tar is a 40 neuron NN trained for 100k steps, wins about 73% of games against beginner, 50% of games against intermediate, and fewer against advanced and world class Took about 10 minutes to train with 16 processes, in hindsight, it actually wins about 55% of games, which is probably close enough
exp_20221230_1713_28_521264_1000000.tar is a 40 neuron NN trained for 1 million steps Took about 2.2 hours to train with 12 processes .57 against intermediate, .43 against advanced .33 against world-class .77 against beginner
eval_net is a 40 neuron NN. It is about .6 against intermediate and .4 against advanced.
exp_20221230_1028_54_571857_10000.tar .33 against intermediate exp_20221230_1048_41_259706_100000.tar is 539 / 1000 for intermediate
// 2023_01_03_20_10_49 exp_510000.tar;{'intermediate': 70, 'advanced': 54} //lets see how we do over 1k episodes on gnubg won 492/1000 for 49.2%