Skip to content

MiniZero: An AlphaZero and MuZero Training Framework

Notifications You must be signed in to change notification settings

b08202011/minizero_policydetection

 
 

Repository files navigation

MiniZero

MiniZero is a zero-knowledge learning framework that supports AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel MuZero algorithms.

This is the official repository of the paper MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games.

If you use MiniZero for research, please consider citing our paper as follows:

@misc{wu2023minizero,
  title={MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games}, 
  author={Ti-Rong Wu and Hung Guei and Po-Wei Huang and Pei-Chiun Peng and Ting Han Wei and Chung-Chin Shih and Yun-Jui Tsai},
  year={2023},
  eprint={2310.11305},
  archivePrefix={arXiv},
  primaryClass={cs.AI}
}

Outline

Overview

MiniZero utilizes zero-knowledge learning algorithms to train game-specific AI models

It includes a variety of zero-knowledge learning algorithms:

  • AlphaZero
  • MuZero
  • Gumbel AlphaZero
  • Gumbel MuZero

It supports a variety of game environments:

  • Go
  • NoGo
  • Killall-Go
  • Gomoku / Outer-Open Gomoku
  • Othello
  • Hex
  • TicTacToe
  • Atari (57 games)

We are planning to add new algorithms, features, and more games in the future.

Architecture

The MiniZero architecture comprises four components: a server, self-play workers, an optimization worker, and data storage.

MiniZero Architecture

Server

The server is the core component in MiniZero, controlling the training process and managing both the self-play and optimization workers.

In each iteration, the server first instructs all self-play workers to generate self-play games simultaneously using the latest network and collects game records from self-play workers. Once the server accumulates the necessary self-play games, it then stops the self-play workers and instructs the optimization worker to load the latest game records and start network updates. After the network has been updated, the server starts the next iteration until the training reaches a predetermined maximum iteration.

Self-play worker

The self-play worker interacts with the environment to produce self-play games.

There may be multiple self-play workers. Each self-play worker maintains multiple MCTS instances to play multiple games simultaneously with batch GPU inferencing to improve efficiency. Specifically, the self-play worker runs the selection for each MCTS to collect a batch of leaf nodes and then evaluates them through batch GPU inferencing. Finished self-play games are sent to the server and forwarded to the data storage by the server.

Optimization worker

The optimization worker updates the network using collected self-play games.

Specifically, it loads self-play games from data storage and stores them into the replay buffer, and then updates the network over steps using data sampled from the replay buffer. Generally, the number of optimized steps is proportional to the number of collected self-play games to prevent overfitting. Finally, the updated networks are stored into the data storage.

Data storage

The data storage stores network files and self-play games.

Specifically, it uses the Network File System (NFS) for sharing data across different machines. This is an implementation choice; a simpler file system can suffice if distributed computing is not employed.

Prerequisites

MiniZero requires a Linux platform with at least one NVIDIA GPU to operate. To facilitate the use of MiniZero, a container image is pre-built to include all required packages. Thus, a container tool such as docker or podman is also required.

Show platform recommendations
  • Modern CPU with at least 64G RAM
  • NVIDIA GPU of GTX 1080 (VRAM 8G) or above
  • Linux operating system, e.g., Ubuntu 22.04 LTS
Show tested platforms
CPU RAM GPU OS
Xeon Silver 4216 x2 256G RTX A5000 x4 Ubuntu 20.04.6 LTS
Xeon Silver 4216 x2 128G RTX 3080 Ti x4 Ubuntu 20.04.5 LTS
Xeon Silver 4216 x2 256G RTX 3090 x4 Ubuntu 20.04.5 LTS
Xeon Silver 4210 x2 128G RTX 3080 x4 Ubuntu 22.04 LTS
Xeon E5-2678 v3 x2 192G GTX 1080 Ti x4 Ubuntu 20.04.5 LTS
Xeon E5-2698 v4 x2 128G GTX 1080 Ti x1 Arch Linux LTS (5.15.90)
Core i9-7980XE 128G GTX 1080 Ti x1 Arch Linux (6.5.6)

Quick Start

This section walks you through training AI models using zero-knowledge learning algorithms, evaluating trained AI models, and launching the console to interact with the AI.

First, clone this repository.

put the training csv file to csv_to_trainvalsgf.py put the testing csv file to csv_to_sgf.py

Then, start the runtime environment using the container.

scripts/start-container.sh # must have either podman or docker installed

Once a container starts successfully, its working folder should be located at /workspace. You must execute all of the following commands inside the container.

Training

To train policy:

trainpolicy.sh   #need to assign traindir(must exist a directory name model inside it) training config(can use example.cfg) training.sgf validation .sgf 
trainkyupolicy.sh

Evaluation

To evaluate :

eval.sh #need assign cfg(example.cfg) testing sgf
        # need to modify the load model in policy_play.py policy_playkyu.py policy_playkyuprivate.py policy_playpri.py
        #need to have a submission templete

References

About

MiniZero: An AlphaZero and MuZero Training Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 50.1%
  • Python 42.5%
  • Shell 6.9%
  • CMake 0.5%