Pure Monte Carlo Counterfactual Regret Minimization

This is the code accompanying the paper: Pure Monte Carlo Counterfactual Regret Minimization. By Ju Qi.

OverView

Abstract

Instructions

Requirements

numpy
joblib
matplotlib
seaborn
Pillow
certifi
cycler

How to run the framework

TrainGFSP_Sampling.py

game_name = 'Leduc'
is_show_policy = False
prior_state_num = 3
y_pot = 3
z_len = 3

The above code is used to set the experimental configuration, where 'game_name' determines the type of game to be trained. This can be referenced from the code below or from the 'Game_Sampling'.

The 'prior_state_num' is used to set the scale of the game, and there are several ways to understand it: For 'Kuhn', 'Leduc', 'KuhnNPot', 'Leduc3Pot', and 'Leduc5Pot', it can be understood as the number of cards set in the game. For 'Goofspiel', it can be understood as the number of cards in each player’s hand. For 'PAM', it can be understood as the step length to terminate the game.

The 'y_pot''z_len' is only for when the 'game_name' is 'KuhnNPot'.

train_mode = 'fix_itr'
log_interval_mode = 'itr'
log_mode = 'exponential'

The 'train_mode' is set to the mode used for training: For "train_mode = 'fix_itr'"it means fix number of training rounds. For "train_mode = 'node_touched'" it means fix number of nodes passed during training. For "train_mode = 'tran_time'"it means fix training time.

The 'log_interval_mode' is set to the mode used for recording results.

The 'log_mode' is set to the interval for recording results: For'exponential' records in exponential form. Fpr 'normal' records in arithmetic form.

total_train_constraint = 10
log_interval = 5
nun_of_train_repetitions = 5
n_jobs = 1

The setting of 'total_train_constraint' depends on the selected 'train_mode': For "train_mode = 'tran_time'" then the 'total_train_constraint=100' means that each method is trained for 100s. For "train_mode = 'node_touched'" means that each method is trained to pass a set number of nodes.

The 'log_interval' is used to set the recording of training results.

The 'nun_of_train_repetitions' is used to set how many times a training is repeated.

The 'n_jobs' is for parallel training, generally 1 for your own compute.

After setting the above parameters, you can start running the experiment you want.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
GAME_Sampling		GAME_Sampling
GFSP_Sampling		GFSP_Sampling
draw		draw
evaluate		evaluate
.gitattributes		.gitattributes
.gitignore		.gitignore
CONFIG.py		CONFIG.py
README.md		README.md
TrainGFSP_Sampling.py		TrainGFSP_Sampling.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pure Monte Carlo Counterfactual Regret Minimization

OverView

Instructions

Requirements

How to run the framework

About

Releases

Packages

Languages

Zealoter/PCFR

Folders and files

Latest commit

History

Repository files navigation

Pure Monte Carlo Counterfactual Regret Minimization

OverView

Instructions

Requirements

How to run the framework

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages