PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense

This repository hosts the implementation of PoolFlip, a multi-agent Gymnasium/PettingZoo environment inspired by FlipIt for studying attacker–defender dynamics in cyber defense.

Paper: link coming soon.

Abstract

Cyber defense requires automating defensive decision-making under stealthy, deceptive, and continuously evolving adversarial strategies. The FlipIt game provides a foundational framework for modeling interactions between a defender and an advanced adversary that compromises a system without being immediately detected. In FlipIt, the attacker and defender compete to control a shared resource by performing a Flip action and paying a cost. However, the existing FlipIt frameworks rely on a small number of heuristics or specialized learning techniques, which can lead to brittleness and the inability to adapt to new attacks. To address these limitations, we introduce PoolFlip, a multi-agent gym environment that extends the FlipIt game to allow efficient learning for attackers and defenders. Furthermore, we propose Flip-PSRO, a multi-agent reinforcement learning (MARL) approach that leverages population-based training to train defender agents equipped to generalize against a range of unknown, potentially adaptive opponents. Our empirical results suggest that Flip-PSRO defenders are $2\times$ more effective than baselines to generalize to a heuristic attack not exposed in training. In addition, our newly designed ownership-based utility functions ensure that Flip-PSRO defenders maintain a high level of control while optimizing performance.

Install

To install the different dependencies we use conda (see env.yaml) and pip, the following will create the environment poolflip, the required dependencies depend on your use case (see Environment only and Full stack):

conda env create -f env.yaml

Then,

conda activate poolflip

Environment only

To install only the packages for the PoolFlipEnv (usable as shown here)

pip install .

Full stack

Otherwise,

pip install .[full]

Quick start

A minimal code is provided in minimal.py

from omegaconf import OmegaConf 

from poolflip import PoolFlipEnv # The Multi Agent environment
import poolflip.agents as agents

# Configuration fo the environment
cfg = OmegaConf.create(
        {
            "num_resources": 1,
            "num_players": 2,
            "max_num_steps": 10,  # Number of steps in an episode
            "num_global_actions": 1,  # Number of global actions (1 for Sleep in the Sleep | Check | Flip setting)
            "actions_to_costs":
            {
                0: 0.0,  # Sleep: cost 0.0
                1: 2.0,  # Flip: cost 2.0
                2: 1.0,  # Check: cost 1.0
            },
        }
    )

possible_agents = {
    "defender": agents.SleepAgent(),
    "attacker": agents.PeriodicCheckAgent(phase=4, delay=1),
}

env = PoolFlipEnv(possible_agents=possible_agents, configuration=cfg)

obs, infos = env.reset(seed=42)

for step in range(cfg.max_num_steps):
    actions = {
        agent: env.action_space(agent).sample() for agent in env.agents
    }
    next_obs, rewards, terminations, truncations, infos = env.step(actions)
    if all(terminations.values()) or all(truncations.values()):
        break

env.close()

Training and Evaluation

For the following please install the full requirements:

pip install .[full]

We use MLFlow and Ray for experiment tracking and experiment parallelization.

.
├── config
│   ├── agents      <- Agent configurations
│   └── envs        <- Environment configurations
├── poolflip        <- The Environment (Usable with the minimal install)
└── flip_psro       <- The Flip-PSRO which requires the full install with MLFlow.

Export your MLFlow URI

export MLFLOW_TRACKING_URI=<MFLOW_URI>

And start your ray cluster via:

ray start --head

Training Agents (`train.py`)

python train.py --defender ppo_no_deterministic_eval --attacker periodic_4 --env episodes=1k_steps=100_players=2_resources=1

Will train a PPO agent against a Periodic(4) agent in a 1 resource environment with 100 steps per episodes and 1,000 episodes.

The run id generated by MLFlow can then be used in the evaluation step to load the weights of the trained agent.

Evaluating Agents (`eval.py`)

Assuming the previous training loop was registered under the run_id <RUN_ID>

python eval.py --defender ppo_no_deterministic_eval --attacker periodic_4 --env episodes=100_steps=100_players=2_resources=1 --defender_run_id <RUN_ID>

Would evaluate the trained PPO agent against a Periodic(4) agent in a 1 resource environment with 100 steps, over 100 episodes.

The following would evaluate the same PPO agent against a Burst(8,6) opponent.

python eval.py --defender ppo_no_deterministic_eval --attacker burst_8_6 --env episodes=100_steps=100_players=2_resources=1 --defender_run_id <RUN_ID>

Running the Flip-PSROs (`flip_psro.py`)

The flip_psro.py combines the two steps above. For instance, the following will run Flip-PSRO with uniform opponent selection for a randomly initialized PPO agent against a pool of opponents consisting of Periodic(4), Awakening(0.05), Burst(8,3), PeriodicCheck(4), and PAC(4) opponents.

python flip_psro.py --defender ppo_no_deterministic_eval --attacker periodic_4,awakening_05,burst_8_3,periodic_check_4,periodic_aggressive_check_4 --env episodes=100_steps=100_players=2_resources=1_expensive_check

Testsuite

pytest

License

This repository is released under the MIT License.

It makes use of MLflow, which is licensed under the Apache License, Version 2.0.

It makes use of Ray, which is licensed under the Apache License, Version 2.0.

See https://www.apache.org/licenses/LICENSE-2.0 for details.

Citation

Proceedings link and BibTeX will be posted once available.

Acknowledgments

We would like to express our gratitude to all references in our paper that open-sourced their codebase, methodology, and dataset, which served as the foundation for our work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense

Contents

Abstract

Install

Environment only

Full stack

Quick start

Training and Evaluation

Training Agents (`train.py`)

Evaluating Agents (`eval.py`)

Running the Flip-PSROs (`flip_psro.py`)

Testsuite

License

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
flip_psro		flip_psro
poolflip		poolflip
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yaml		env.yaml
eval.py		eval.py
experiment.py		experiment.py
flip_psro.py		flip_psro.py
minimal.py		minimal.py
pyproject.toml		pyproject.toml
train.py		train.py

License

xcadet/poolflip

Folders and files

Latest commit

History

Repository files navigation

PoolFlip: A Multi-Agent Reinforcement Learning Security Environment for Cyber Defense

Contents

Abstract

Install

Environment only

Full stack

Quick start

Training and Evaluation

Training Agents (train.py)

Evaluating Agents (eval.py)

Running the Flip-PSROs (flip_psro.py)

Testsuite

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Training Agents (`train.py`)

Evaluating Agents (`eval.py`)

Running the Flip-PSROs (`flip_psro.py`)

Packages