feat(eval): WAF evaluation lab + M6-01 evaluation plan by bihius · Pull Request #225 · bihius/guard-proxy

bihius · 2026-06-02T20:09:34Z

Summary

Closes M6-01 — Evaluation plan document #133 (M6-01 — evaluation plan document)
Replaces toy echo-server demo with real target apps: OWASP Juice Shop, DVWA, and WordPress behind the WAF, each on its own vhost
Adds a fully reproducible evaluation lab in benchmarks/lab/ with Docker Compose overlay, setup/teardown scripts, scenario configs, and runner scripts
Adds benchmarks/Makefile + root Makefile pass-throughs (make eval-up, make eval-all, …)
The evaluation plan document (evaluation-plan.md) lives in the thesis repo (dsw-latex-thesis) — committed there separately; thesis/ is gitignored here

What's in `benchmarks/lab/`

File	Purpose
`docker-compose.targets.yml`	Overlay adding Juice Shop, DVWA (+MariaDB), WordPress (+wp-db, wp-cli one-shot)
`.env.example`	Target DB credentials, vhost domains, policy settings
`setup-lab.sh`	Registers all target vhosts via backend API; reuses `ensure_vhost` from `setup-demo.sh`
`teardown-lab.sh`	`down` / `down -v`
`scenarios/crs-ftw/config.yaml`	go-ftw config pointing at CRS regression corpus
`scenarios/zap/`	ZAP baseline conf + alert-filter
`scenarios/nuclei/nuclei.yaml`	Nuclei config (sqli/xss/lfi/rfi, rate-limited)
`scenarios/load/benign-mix.lua`	wrk Lua script for realistic benign load (WAF vs direct)
`runners/lib.sh`	Shared: env helpers, manifest writing, docker stats sampler
`runners/run-ftw.sh`	CRS regression → TPR (gold standard)
`runners/run-zap.sh`	ZAP baseline scan → FPR on real app traffic
`runners/run-nuclei.sh`	Nuclei CVE templates → CVE TPR
`runners/run-load.sh`	wrk WAF + direct → latency p50/p95/p99, RPS, overhead delta
`runners/collect-metrics.sh`	Aggregates all `summary.json` → `results.csv` + `report.json`

Test plan

cp deploy/demo/.env.example deploy/demo/.env && cp benchmarks/lab/.env.example benchmarks/lab/.env && git submodule update --init --recursive
make eval-up — all 5 vhosts healthy (demo-app, demo-api, juice.local, dvwa.local, wp.local)
curl -si -H 'Host: juice.local' 'http://127.0.0.1:8080/?q=1+UNION+SELECT+1--' → HTTP/1.1 403
make eval-ftw — go-ftw run completes, benchmarks/results/run-*/ftw/summary.json created
make eval-all — all scenarios run, results.csv produced
make eval-results — table prints without error

Closes #133 (M6-01 evaluation plan document). Adds a reproducible thesis evaluation lab with real target applications (OWASP Juice Shop, DVWA, WordPress) behind guard-proxy, automated test scenarios, a dedicated Makefile, and the methodology document required before experiments run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gitguardian · 2026-06-02T20:09:39Z

⚠️ GitGuardian has uncovered 10 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
33587452	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587451	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587448	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587453	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587450	Triggered	Generic Password	`8706635`	benchmarks/lab/.env.example	View secret
33587449	Triggered	Generic Password	`8706635`	benchmarks/lab/.env.example	View secret
33587448	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587452	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret
33587454	Triggered	Generic Password	`8706635`	benchmarks/lab/.env.example	View secret
33587448	Triggered	Generic Password	`8706635`	benchmarks/lab/docker-compose.targets.yml	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

Closes #133 (M6-01 — evaluation plan document). Adds a reproducible thesis evaluation lab with real target applications (OWASP Juice Shop, DVWA, WordPress) behind guard-proxy, automated test scenarios, a dedicated Makefile, and runner scripts. The evaluation plan document lives in the thesis repo (dsw-latex-thesis). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix root Makefile passing empty RUN_ID=/TARGET_VHOST= to benchmarks/Makefile, overriding ?= defaults and creating run-/ directories (use $(if ...) guards) - Fix setup-lab.sh ensure_policy: heredoc+here-string conflict caused Python to receive JSON as program source; pass response via POLICY_RESPONSE env var instead - Fix run-zap.sh and run-nuclei.sh: export OUT_DIR (and TARGET_VHOST) before Python heredoc so os.environ lookups find the correct path, not "." - Fix run-zap.sh: inject Host: header via ZAP replacer -config flags instead of ignored -e env var; replace two-step hook/fallback with single reliable run - Fix benchmarks/Makefile results target: strip run- prefix when reading latest dir name to avoid constructing run-run-<id>/results.csv path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The field is not yet implemented in the backend. Drop it from both the baseline and PL2 policy JSON bodies in setup-lab.sh and remove the now-unused LAB_POLICY_OUTBOUND_THRESHOLD variables (already commented out in .env.example). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Closes #133 (M6-01 — evaluation plan document). Defines WAF evaluation methodology: scenarios (go-ftw CRS corpus, ZAP, Nuclei, wrk load), metrics (TPR/FPR, p50/p95/p99 latency, RPS degradation, memory), success criteria, hardware spec (Proxmox LXC), and threats to validity. Lab source: benchmarks/lab/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bihius and others added 5 commits June 2, 2026 22:10

docs: update README.testing.md ref to docs/evaluation-plan.md

c7b1b9f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bihius merged commit 9d08d09 into main Jun 6, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): WAF evaluation lab + M6-01 evaluation plan#225

feat(eval): WAF evaluation lab + M6-01 evaluation plan#225
bihius merged 6 commits into
mainfrom
cranky-bhaskara-64aa72

bihius commented Jun 2, 2026 •

edited

Loading

Uh oh!

gitguardian Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bihius commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in benchmarks/lab/

Test plan

Uh oh!

gitguardian Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ GitGuardian has uncovered 10 secrets following the scan of your pull request.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bihius commented Jun 2, 2026 •

edited

Loading

What's in `benchmarks/lab/`

gitguardian Bot commented Jun 2, 2026 •

edited

Loading