This repository hosts the implementation of PoolFlip, a multi-agent Gymnasium/PettingZoo environment inspired by FlipIt for studying attacker–defender dynamics in cyber defense.
Paper: link coming soon.
Cyber defense requires automating defensive decision-making
under stealthy, deceptive, and continuously evolving adversarial strategies.
The FlipIt game provides a foundational framework for modeling interactions between a defender and an advanced adversary that compromises a system without being immediately detected.
In FlipIt, the attacker and defender compete to control a shared resource by performing a Flip action and paying a cost. However, the existing FlipIt frameworks rely on a small number of heuristics or specialized learning techniques, which can lead to brittleness and the inability to adapt to new attacks.
To address these limitations, we introduce PoolFlip, a multi-agent gym environment that extends the FlipIt game to allow efficient learning for attackers and defenders.
Furthermore, we propose Flip-PSRO, a multi-agent reinforcement learning (MARL) approach that leverages population-based training to train defender agents equipped to generalize against a range of unknown, potentially adaptive opponents. Our empirical results suggest that Flip-PSRO defenders are
To install the different dependencies we use conda (see env.yaml) and pip, the following will create the environment poolflip, the required dependencies depend on your use case (see Environment only and Full stack):
conda env create -f env.yamlThen,
conda activate poolflipTo install only the packages for the PoolFlipEnv (usable as shown here)
pip install .Otherwise,
pip install .[full]A minimal code is provided in minimal.py
from omegaconf import OmegaConf
from poolflip import PoolFlipEnv # The Multi Agent environment
import poolflip.agents as agents
# Configuration fo the environment
cfg = OmegaConf.create(
{
"num_resources": 1,
"num_players": 2,
"max_num_steps": 10, # Number of steps in an episode
"num_global_actions": 1, # Number of global actions (1 for Sleep in the Sleep | Check | Flip setting)
"actions_to_costs":
{
0: 0.0, # Sleep: cost 0.0
1: 2.0, # Flip: cost 2.0
2: 1.0, # Check: cost 1.0
},
}
)
possible_agents = {
"defender": agents.SleepAgent(),
"attacker": agents.PeriodicCheckAgent(phase=4, delay=1),
}
env = PoolFlipEnv(possible_agents=possible_agents, configuration=cfg)
obs, infos = env.reset(seed=42)
for step in range(cfg.max_num_steps):
actions = {
agent: env.action_space(agent).sample() for agent in env.agents
}
next_obs, rewards, terminations, truncations, infos = env.step(actions)
if all(terminations.values()) or all(truncations.values()):
break
env.close()For the following please install the full requirements:
pip install .[full]We use MLFlow and Ray for experiment tracking and experiment parallelization.
.
├── config
│ ├── agents <- Agent configurations
│ └── envs <- Environment configurations
├── poolflip <- The Environment (Usable with the minimal install)
└── flip_psro <- The Flip-PSRO which requires the full install with MLFlow.Export your MLFlow URI
export MLFLOW_TRACKING_URI=<MFLOW_URI>And start your ray cluster via:
ray start --headTraining Agents (train.py)
python train.py --defender ppo_no_deterministic_eval --attacker periodic_4 --env episodes=1k_steps=100_players=2_resources=1Will train a PPO agent against a Periodic(4) agent in a 1 resource environment with 100 steps per episodes and 1,000 episodes.
The run id generated by MLFlow can then be used in the evaluation step to load the weights of the trained agent.
Evaluating Agents (eval.py)
Assuming the previous training loop was registered under the run_id <RUN_ID>
python eval.py --defender ppo_no_deterministic_eval --attacker periodic_4 --env episodes=100_steps=100_players=2_resources=1 --defender_run_id <RUN_ID>Would evaluate the trained PPO agent against a Periodic(4) agent in a 1 resource environment with 100 steps, over 100 episodes.
The following would evaluate the same PPO agent against a Burst(8,6) opponent.
python eval.py --defender ppo_no_deterministic_eval --attacker burst_8_6 --env episodes=100_steps=100_players=2_resources=1 --defender_run_id <RUN_ID>Running the Flip-PSROs (flip_psro.py)
The flip_psro.py combines the two steps above.
For instance, the following will run Flip-PSRO with uniform opponent selection for a randomly initialized PPO agent against a pool of opponents consisting of Periodic(4), Awakening(0.05), Burst(8,3), PeriodicCheck(4), and PAC(4) opponents.
python flip_psro.py --defender ppo_no_deterministic_eval --attacker periodic_4,awakening_05,burst_8_3,periodic_check_4,periodic_aggressive_check_4 --env episodes=100_steps=100_players=2_resources=1_expensive_checkpytest This repository is released under the MIT License.
It makes use of MLflow, which is licensed under the Apache License, Version 2.0.
It makes use of Ray, which is licensed under the Apache License, Version 2.0.
See https://www.apache.org/licenses/LICENSE-2.0 for details.
Proceedings link and BibTeX will be posted once available.
We would like to express our gratitude to all references in our paper that open-sourced their codebase, methodology, and dataset, which served as the foundation for our work.