I wrote some algorithmic players for the game of Squava. I've written two variations on Alpha-beta minimaxing, and two variations on Monte Carlo Tree Search.
This is my second attempt, my first attempt is full of cruft and mistakes. I did not implement Monte Carlo Tree Search correctly, for example.
Comparison of algorithmic players.
Squava is a tic-tac-toe variant. Moves are made like tic-tac-toe, except on a
5x5 grid of cells.
Players alternate marking cells, conventionally with X
or O
.
Four cells of the same mark in a row (vertical, horizontal or diagonal)
wins for the player with that mark. Three cells in a row loses. That is, a
player can win outright, or lose outright.
The rules have an ambiguity, in that it isn't clear what to do if a single marker fills in a row of 3, say, and a diagonal of 4. Does that player win or lose?
I chose "win", mainly because it's computationally easier to check for 4-in-a-row as a win separately from 3-in-a-row as a loss. After all, every 4-in-a-row has 3-in-a-row inside it.
Neither player can win until the 7th move (4 for starting player, 3 for the other). The starting player can win on odd-numbered moves by winning with 4-in-a-row. The starting player can lose on even-numbered moves by losing with 3-in-a-row.
Similarly, the second player wins on even-numbered moves by getting 4-in-a-row, or loses on odd-numbered moves with 3-in-a-row.
A game has a maximum of 25 moves.
I've not seen it written down anywhere, but "squava" is probably "square yavalath", Yavalath being the inspiration for squava.
"Cat" can get the game, as in ordinary tic-tac-toe. Generating boards randomly does show that cat games, and 25-moves games won by both first and second players are possible.
Cat got this game:
O O X X O
X X O O X
O O X X O
X X O O X
O O X X O
The sequence of moves (O
moved first):
2,4 1,0 2,1 2,3 1,3 2,2 0,1 0,2 3,3 3,1 1,2 3,4 4,4 1,1 2,0 4,2 4,1 0,3 3,2 1,4 0,0 3,0 4,0 4,3 0,4
Algorithmic players can produce 25-move games. In my experimentation, all 25-move games are lost by the first player, The first move player has to be an Alpha-beta minimaxing player, and the second player a Monte Carlo Tree Search player.
The first player (O
) won this randomly-generated game:
0 1 2 3 4
0 X X O O X
1 O X X O O
2 X O O O X
3 O X X O O
4 X X O O X
Here's the series of moves. O
moves first, X
second:
2,2 4,4 4,2 0,4 1,4 4,0 2,1 2,4 3,0 3,1 1,3 0,0 3,4 3,2 4,3 1,1 1,0 1,2 0,2 2,0 0,3 0,1 3,3 4,1 2,3
The final move at 2,3 completes a 5-in-a-row, and a 3-in-a-row. That's an extremely unlikely outcome for a real game, where at least one of the players tries to win.
I wrote an interactive player, and a program that matches two algorithmic players against each other.
$ go build sqv.go
$ ./sqv -t M -C
X (MCTS/Plain) <3,2> (9255) [500000] 1.940846297s
0 1 2 3 4
0 _ _ _ _ _
1 _ _ _ _ _
2 _ _ _ _ _
3 _ _ X _ _
4 _ _ _ _ _
Your move: 1 4
0 1 2 3 4
0 _ _ _ _ _
1 _ _ _ _ O
2 _ _ _ _ _
3 _ _ X _ _
4 _ _ _ _ _
In the above game fragment, a freshly-compiled sqv
program ran the plain
Monte Carlo Tree Search player to find the first move, 3,2, the X
.
The human chose the next move, 1,4, signified by an 'O'
$ go build playoff.go
$ ./playoff -1 U -2 M -n 2
0 MCTS/UCB1 MCTS/Plain 20 -1 40.23 2,2 3,1 3,2 0,2 1,1 0,1 4,3 0,4 0,3 3,4 4,4 3,3 1,0 2,1+ 4,1 4,2+ 2,4 1,2+ 4,0 2,3+
1 MCTS/UCB1 MCTS/Plain 9 1 28.08 1,1 2,2 4,1 0,4 1,4 4,3 3+,1 0,1 2+,1
Above, we have 2 games, first move by MCTS with UCB1 move selection versus plain old MCTS. The first game took 20 moves, the plain MCTS 2nd player won in 40.23 seconds The second game took 9 moves, the first player won in 28.08 seconds. The series of moves are listed after the per-game statistics.
You can re-use the series of moves in two ways:
- The
recreate
program accepts either a file name with the string of moves as contents, or the string of moves on the command line:./recreate '1,1 2,2 4,1 0,4 1,4 4,3 3+,1 0,1 2+,1'
- You hit return after
recreate
shows you the board so far.
- The
sqv
program can accept a partial game on the command line, setting up the board for an algorithmic player:./sqv -t U -p '1,1 2,2 4,1 0,4 1,4'
You can investigate which move the algorithmic players make in a given
situation with the -p 'x,y x,y...'
partial game.
I wrote another program that calculates Elo ratings of the algorithms. It starts every algorithm at a rating of 1300, with 14 effective games played. The program picks a pair of algorithms, and has them play a game. At the end of the game, the program calculates K, E and S for the pair of algorithms.
- K = 800/N, N is the number of games the algorithm has played, including this game.
- E = 1/(1 + 10(R1 - R0)/400) Where R1 is the other algorithm's rating before the game, and R0 is this algorithm's rating before the game.
- S = 1, if the algorithm wins, 0.5 if it draws, or 0 if it loses.
Change in rating of the algorithm is K(S - E)
I had to let the program play 400 games before the Elo ratings stabilized.
$ go build elo.go
$ ./elo -n 400
0 70.96 A M A 1300 1327 15 1300 1273 15
1 77.77 A U U 1327 1300 16 1300 1329 15
2 11.14 M A A 1273 1250 16 1300 1321 17
3 200.22 G M G 1300 1323 15 1250 1230 17
...
Each output line is game number, elapsed time in seconds, first player, second player, winner, first player's rating before game, first player's rating after game, first player's effective games, second player's rating before game, second player's rating after game, second player's effective games.
After several 400-game runs, I conclude that Alpha-beta minimaxing (abbreviated 'A' above) rates about 1320, Alpha-beta minimaxing with a better static valuation (abbreviated 'G' above) rates around 1340, MCTS with UCB1 (abbreviated 'U' above) rates around 1400, while plain MCTS ('M' above) is no more than 1110.
The Player
interface describes code that has an internal representation of
a squava game and can choose a move based on that internal representation.
The "board" isn't specified externally, each algorithm can have an internal
representation of the game board customized for itself.
That means that each implementation
of Player
has to have its own way to find a "winner" (or loser),
and to make a human readable representation of its internal board state.
Method MakeMove
has player type so that driver programs
(like sqv.go
)
can set a board to some desired
configuration before letting the algorithm choose a move.
type Player interface {
Name() string
MakeMove(int, int, int) // x,y coords, type of player (MAXIMIZER, MINIMIZER
ChooseMove() (int, int, int, int) // x,y coords of move, value, leaf node count
FindWinner() int
String() string // human readable formatted board
}
Code for each algorithmic player satisfies this Go interface. Using a Go interface allows the two driver programs to work with a single type of "player". That was very convenient.
Code for all algorithmic players lives in the same package. I had them in separate packages in my first attempt at algorithmic players. It seemed like that arrangement required lots of redundant code, so this time around I put them all in the same package. This, too, seems like a mistake, as I had to rename some of the variables holding things like "which sets of slots make a 4-in-a-row". I don't know what the solution for this is.
The Alpha-beta minimaxing algorithms have an internal representation of the board that looks like this:
type board [5][5]int
The Monte Carlo Tree Search variants have an internal board representation that looks like this:
var board [25]int
Go the programming language really doesn't have 2-D arrays,
so I worried about the efficiency of [5][5]int
in the Alpha-beta
algorithms.
There's a mismatch between what a human (me) wants to give the interactive
program (a row and a column number) versus what the Monte Carlo Tree Search
programs use ([25]int
array.
It was also harder to specify the array indexes that make a 4-in-a-row win,
or a 3-in-a-row loss.
Again, I don't know which alternative really is better.
Peiyan Yang has put together a program that has solved the game for the first player.