This repo simulates several online learning algorithms on games. The algorithm in arXiv:2204.11417 is implemented, and the experiment results in the paper is reproduced.
Under general-sum games with <=4 players, each with <=3 actions, gameplays are simulated with different learning rules (EG, OG, vanilla GD, and the two algorithms in the paper).
The utilities are i.i.d. sampled from a uniform distribution. OG and EG both uses orthogonal projection to the probability simplex.
Negative results on OG and EG:
OG and EG both have linear swap regret.
The paper [arXiv:2204.11417v1] claims "
Positive results on OG and EG:
I have been messing around with different parameters, and never found a case with a
Sometimes the swap regret grows linearly, yet the external regret grows negative-linearly in
Under learning rate
Reproducing the Paper:
The algorithms OFTRL-LogBar
and BM-OFTRL-LogBar
in the paper [arXiv:2204.11417] are implemented. The former has O(log T) regret, and the latter O(log T) swap regret.
Surprisingly, it seems that the former also has logarithmic swap regret. Sadly I can't prove that.