Skip to content

muteyaki/cemma

Repository files navigation

CEMMA

Implementation for the paper: "Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution".

Three agents operate in each round:

  • Attacker: performs prompt mutation and crossover.
  • Defender: generates multimodal responses and is updated via LoRA SFT.
  • Judge: evaluates jailbreak success.

Key Files

  • pipeline.py: main loop.
  • attacker/prompt_init.py, attacker/attacker_v2.py: seed loading + evolution.
  • defender/defender_v2.py, defender/sft_llamafactory.py: inference + LoRA SFT.
  • judge.py: scoring.
  • eval.py: standalone evaluation.
  • configs/attacker_config.yaml, configs/defender_config.yaml: runtime config.

Data

Expected under datasets/:

  • safebench/, mm-safebench/, hades/ are obtained from the MML repo.
  • VLGuard/ is obtained from the VLGuard HF.

Setup and Run

pip install -r requirements.txt
export QWEN_API_KEY=your_key
python pipeline.py

About

Code for the paper: "Co-Evolutionary Multi-Modal Alignment via Structured Adversarial Evolution".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages