: thoroughly-tested. In many cases,
we verified against known values and/or reproduced results from papers.
~: implemented but lightly tested.
X: known problems; please see github issues.
| Algorithms | Category | Reference | Status |
|---|---|---|---|
| Information Set Monte Carlo Tree Search (IS-MCTS) | Search | Cowley et al. '12 | ~ |
| Minimax (and Alpha-Beta) Search | Search | Wikipedia1, Wikipedia2, Knuth and Moore '75 | ![]() |
| Monte Carlo Tree Search | Search | Wikipedia, UCT paper, Coulom '06, Cowling et al. survey | ![]() |
| Lemke-Howson (via nashpy) | Opt. | Wikipedia, Shoham & Leyton-Brown '09 | ![]() |
| Sequence-form linear programming | Opt. | Koller, Megiddo, and von Stengel '94, Shoham & Leyton-Brown '09 |
![]() |
| Counterfactual Regret Minimization (CFR) | Tabular | Zinkevich et al '08, Neller & Lanctot '13 | ![]() |
| CFR against a best responder (CFR-BR) | Tabular | Johanson et al '12 | ![]() |
| Exploitability / Best response | Tabular | Shoham & Leyton-Brown '09 | ![]() |
| External sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | ![]() |
| Outcome sampling Monte Carlo CFR | Tabular | Lanctot et al. '09, Lanctot '13 | ![]() |
| Q-learning | Tabular | Sutton & Barto '18 | ![]() |
| Policy Iteration | Tabular | Sutton & Barto '18 | ![]() |
| Value Iteration | Tabular | Sutton & Barto '18 | ![]() |
| Advantage Actor-Critic (A2C) | RL | Mnih et al. '16 | ![]() |
| Deep Q-networks (DQN) | RL | Mnih et al. '15 | ![]() |
| Ephemeral Value Adjustments (EVA) | RL | Hansen et al. '18 | ~ |
| Deep CFR | MARL | Brown et al. '18 | ~ |
| Exploitability Descent (ED) | MARL | Lockhart et al. '19 | ![]() |
| (Extensive-form) Fictitious Play (XFP) | MARL | Heinrich, Lanctot, & Silver '15 | ![]() |
| Neural Fictitious Self-Play (NFSP) | MARL | Heinrich & Silver '16 | ![]() |
| Neural Replicator Dynamics (NeuRD) | MARL | Omidshafiei, Hennes, Morrill, et al. '19 | X |
| Regret Policy Gradients (RPG, RMPG) | MARL | Srinivasan, Lanctot, et al. '18 | ![]() |
| Policy-Space Response Oracles (PSRO) | MARL | Lanctot et al. '17 | ![]() |
| Q-based ("all-actions") Policy Gradient (QPG) | MARL | Srinivasan, Lanctot, et al. '18 | ![]() |
| Regression CFR (RCFR) | MARL | Waugh et al. '15, Morrill '16 | ![]() |
| Rectified Nash Response (PSRO_rn) | MARL | Balduzzi et al. '19 | ~ |
| α-Rank | Eval. / Viz. | Omidhsafiei et al. '19, arXiv | ![]() |
| Replicator / Evolutionary Dynamics | Eval. / Viz. | Hofbaeur & Sigmund '98, Sandholm '10 | ![]() |