Implementation for the paper: "Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution".
Three agents operate in each round:
Attacker: performs prompt mutation and crossover.Defender: generates multimodal responses and is updated via LoRA SFT.Judge: evaluates jailbreak success.
pipeline.py: main loop.attacker/prompt_init.py,attacker/attacker_v2.py: seed loading + evolution.defender/defender_v2.py,defender/sft_llamafactory.py: inference + LoRA SFT.judge.py: scoring.eval.py: standalone evaluation.configs/attacker_config.yaml,configs/defender_config.yaml: runtime config.
Expected under datasets/:
safebench/,mm-safebench/,hades/are obtained from the MML repo.VLGuard/is obtained from the VLGuard HF.
pip install -r requirements.txt
export QWEN_API_KEY=your_key
python pipeline.py