This project is part of the Final Degree Project for Computer Science at Universidad Complutense de Madrid (UCM).
AlphaZero is a general-purpose reinforcement learning algorithm developed by DeepMind in 2017. It can learn from tabula rasa, given no domain knowledge except the game rules, and achieves superhuman performance in combinational games such as Go and Chess. It uses self-play, so that it starts playing randomly against itself and gradually learns further comprehension of the game.
In this project, we develop our own version of AlphaZero, capable of being executed on a personal computer.
Due to the lack of powerful computational resources, we have focused in less complex games such as Tic-Tac-Toe and Connect 4. However, our implementation is highly versatile and it can be used for any game very easily. In order to verify that our implementation is learning properly, we have tested it against other implemented algorithms (Minimax and Monte Carlo Tree Search).
A thoroughly description of the project can be found in the project memory.
This repository is divided into multiple folders:
- tfg: the core of the project is in this folder. The implementation is divided into several modules: game representation, strategies, neural network and the actual AlphaZero.
- game: game implementations are found in this folder.
- models: in this folder we store some neural network models and checkpoints for saving and testing purposes.
- experiments: in this folder we keep the Jupyter Notebooks that were used for testing the implementation.
See Requirements.
You can see an example of execution in the Jupyter Notebooks included in the experiments folder.
- Pablo Sanz Sanz - Project Member - Soy-yo
- Juan Carlos Villanueva Quirós - Project Member - jcturing
- Antonio A. Sánchez Ruiz-Granados - Project Manager - antsanchucm
This project is licensed under the MIT License.