Skip to content

feat(eval): WAF evaluation lab + M6-01 evaluation plan#225

Merged
bihius merged 6 commits into
mainfrom
cranky-bhaskara-64aa72
Jun 6, 2026
Merged

feat(eval): WAF evaluation lab + M6-01 evaluation plan#225
bihius merged 6 commits into
mainfrom
cranky-bhaskara-64aa72

Conversation

@bihius
Copy link
Copy Markdown
Owner

@bihius bihius commented Jun 2, 2026

Summary

  • Closes M6-01 — Evaluation plan document #133 (M6-01 — evaluation plan document)
  • Replaces toy echo-server demo with real target apps: OWASP Juice Shop, DVWA, and WordPress behind the WAF, each on its own vhost
  • Adds a fully reproducible evaluation lab in benchmarks/lab/ with Docker Compose overlay, setup/teardown scripts, scenario configs, and runner scripts
  • Adds benchmarks/Makefile + root Makefile pass-throughs (make eval-up, make eval-all, …)
  • The evaluation plan document (evaluation-plan.md) lives in the thesis repo (dsw-latex-thesis) — committed there separately; thesis/ is gitignored here

What's in benchmarks/lab/

File Purpose
docker-compose.targets.yml Overlay adding Juice Shop, DVWA (+MariaDB), WordPress (+wp-db, wp-cli one-shot)
.env.example Target DB credentials, vhost domains, policy settings
setup-lab.sh Registers all target vhosts via backend API; reuses ensure_vhost from setup-demo.sh
teardown-lab.sh down / down -v
scenarios/crs-ftw/config.yaml go-ftw config pointing at CRS regression corpus
scenarios/zap/ ZAP baseline conf + alert-filter
scenarios/nuclei/nuclei.yaml Nuclei config (sqli/xss/lfi/rfi, rate-limited)
scenarios/load/benign-mix.lua wrk Lua script for realistic benign load (WAF vs direct)
runners/lib.sh Shared: env helpers, manifest writing, docker stats sampler
runners/run-ftw.sh CRS regression → TPR (gold standard)
runners/run-zap.sh ZAP baseline scan → FPR on real app traffic
runners/run-nuclei.sh Nuclei CVE templates → CVE TPR
runners/run-load.sh wrk WAF + direct → latency p50/p95/p99, RPS, overhead delta
runners/collect-metrics.sh Aggregates all summary.jsonresults.csv + report.json

Test plan

  • cp deploy/demo/.env.example deploy/demo/.env && cp benchmarks/lab/.env.example benchmarks/lab/.env && git submodule update --init --recursive
  • make eval-up — all 5 vhosts healthy (demo-app, demo-api, juice.local, dvwa.local, wp.local)
  • curl -si -H 'Host: juice.local' 'http://127.0.0.1:8080/?q=1+UNION+SELECT+1--'HTTP/1.1 403
  • make eval-ftw — go-ftw run completes, benchmarks/results/run-*/ftw/summary.json created
  • make eval-all — all scenarios run, results.csv produced
  • make eval-results — table prints without error

Closes #133 (M6-01 evaluation plan document).

Adds a reproducible thesis evaluation lab with real target applications
(OWASP Juice Shop, DVWA, WordPress) behind guard-proxy, automated test
scenarios, a dedicated Makefile, and the methodology document required
before experiments run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gitguardian
Copy link
Copy Markdown

gitguardian Bot commented Jun 2, 2026

⚠️ GitGuardian has uncovered 10 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
33587452 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587451 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587448 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587453 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587450 Triggered Generic Password 8706635 benchmarks/lab/.env.example View secret
33587449 Triggered Generic Password 8706635 benchmarks/lab/.env.example View secret
33587448 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587452 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
33587454 Triggered Generic Password 8706635 benchmarks/lab/.env.example View secret
33587448 Triggered Generic Password 8706635 benchmarks/lab/docker-compose.targets.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

bihius and others added 5 commits June 2, 2026 22:10
Closes #133 (M6-01 — evaluation plan document).

Adds a reproducible thesis evaluation lab with real target applications
(OWASP Juice Shop, DVWA, WordPress) behind guard-proxy, automated test
scenarios, a dedicated Makefile, and runner scripts. The evaluation plan
document lives in the thesis repo (dsw-latex-thesis).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix root Makefile passing empty RUN_ID=/TARGET_VHOST= to benchmarks/Makefile,
  overriding ?= defaults and creating run-/ directories (use $(if ...) guards)
- Fix setup-lab.sh ensure_policy: heredoc+here-string conflict caused Python to
  receive JSON as program source; pass response via POLICY_RESPONSE env var instead
- Fix run-zap.sh and run-nuclei.sh: export OUT_DIR (and TARGET_VHOST) before
  Python heredoc so os.environ lookups find the correct path, not "."
- Fix run-zap.sh: inject Host: header via ZAP replacer -config flags instead of
  ignored -e env var; replace two-step hook/fallback with single reliable run
- Fix benchmarks/Makefile results target: strip run- prefix when reading latest
  dir name to avoid constructing run-run-<id>/results.csv path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The field is not yet implemented in the backend. Drop it from both
the baseline and PL2 policy JSON bodies in setup-lab.sh and remove
the now-unused LAB_POLICY_OUTBOUND_THRESHOLD variables (already
commented out in .env.example).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #133 (M6-01 — evaluation plan document).

Defines WAF evaluation methodology: scenarios (go-ftw CRS corpus,
ZAP, Nuclei, wrk load), metrics (TPR/FPR, p50/p95/p99 latency, RPS
degradation, memory), success criteria, hardware spec (Proxmox LXC),
and threats to validity. Lab source: benchmarks/lab/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bihius bihius merged commit 9d08d09 into main Jun 6, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

M6-01 — Evaluation plan document

1 participant