feat: Add testing harness for dissertation with evidence collection a…#38
feat: Add testing harness for dissertation with evidence collection a…#38stephenjtyrrell wants to merge 1 commit into
Conversation
…nd analysis - Introduced a comprehensive testing harness in the `testing` directory to facilitate evidence collection for governance and scalability tests. - Created scripts for capturing pull request audits, recording policy events, and summarizing run results. - Implemented a common library (`common.sh`) for shared functions and configurations. - Added CSV files for storing evidence data, including approval events, policy events, and run summaries. - Developed Terraform and Kubernetes fixtures to simulate negative control scenarios for governance testing. - Updated README and configuration files to guide users through the setup and execution of tests. - Established a structured write-up template for documenting findings from the testing process.
There was a problem hiding this comment.
Pull request overview
Adds a dissertation-focused testing harness under testing/ plus supporting scripts to collect governance/scalability evidence (GitHub Actions, Terraform/OPA, Kubernetes/Conftest, ArgoCD), store it as structured CSVs, and generate a summarized markdown report for analysis/write-up.
Changes:
- Introduces reusable capture/record scripts (
scripts/testing/*) and Makefile targets to initialize, collect, and analyze evidence. - Adds negative-control fixtures for Terraform and Kubernetes policy enforcement tests.
- Adds evidence CSV schemas (and initial datasets), documentation, and a dissertation write-up template.
Reviewed changes
Copilot reviewed 29 out of 31 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| testing/fixtures/terraform/unrestricted-ingress-plan.json | Terraform negative-control fixture for unrestricted ingress detection |
| testing/fixtures/terraform/unencrypted-s3-plan.json | Terraform negative-control fixture for missing S3 encryption detection |
| testing/fixtures/terraform/public-storage-plan.json | Terraform negative-control fixture for public storage detection |
| testing/fixtures/terraform/missing-owner-plan.json | Terraform negative-control fixture for missing required tags |
| testing/fixtures/terraform/missing-environment-plan.json | Terraform negative-control fixture for missing required tags |
| testing/fixtures/terraform/missing-cost-center-plan.json | Terraform negative-control fixture for missing required tags |
| testing/fixtures/kubernetes/privileged-container.yaml | Kubernetes negative-control fixture for securityContext violations |
| testing/fixtures/kubernetes/missing-resources.yaml | Kubernetes negative-control fixture for missing resource requests/limits |
| testing/fixtures/kubernetes/missing-labels.yaml | Kubernetes negative-control fixture for required label violations |
| testing/evidence/run_summary.csv | Run-level evidence CSV (schema + initial entries) |
| testing/evidence/policy_events.csv | Policy event evidence CSV (schema + initial entries) |
| testing/evidence/ci_job_events.csv | CI job-level evidence CSV header |
| testing/evidence/artifact_events.csv | Artifact evidence CSV header |
| testing/evidence/argocd_events.csv | ArgoCD evidence CSV header |
| testing/evidence/approval_events.csv | PR approval/audit evidence CSV header |
| testing/dissertation_writeup_template.md | Template for Chapters 5–7 write-up structure |
| testing/config.env.example | Example environment configuration for running the harness |
| testing/README.md | Harness documentation and recommended workflow |
| scripts/testing/run_policy_negative_controls.sh | Executes local Terraform/Kubernetes negative controls and records results |
| scripts/testing/record_run.sh | Records manual run_summary rows |
| scripts/testing/record_policy_event.sh | Records manual policy event rows |
| scripts/testing/init_evidence.sh | Initializes evidence directories and CSV headers |
| scripts/testing/common.sh | Shared functions/config for evidence capture + CSV writing |
| scripts/testing/capture_pr_audit.sh | Captures PR audit/approval evidence via GitHub CLI |
| scripts/testing/capture_ci_run.sh | Captures workflow/job timing + artifacts via GitHub CLI |
| scripts/testing/capture_argocd_snapshot.sh | Captures ArgoCD + cluster snapshot evidence |
| scripts/testing/analyze_results.py | Builds a markdown summary report (median/IQR/p95, rates) from CSV evidence |
| scripts/testing/pycache/analyze_results.cpython-313.pyc | Compiled Python bytecode artifact added (should not be committed) |
| policies/kubernetes/required-labels.rego | Updates first deny rule to match deny contains msg if style |
| Makefile | Adds testing harness targets and parameters |
| .gitignore | Ignores testing/config.env and generated evidence bundles/reports |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| load_testing_env | ||
| init_evidence_layout | ||
| require_cmd gh | ||
| require_cmd jq |
There was a problem hiding this comment.
This script calls duration_between (from common.sh), which uses python3, but python3 is not required/validated here. Add require_cmd python3 (or avoid the Python dependency) so the script fails early with a clear error when Python isn’t available.
| require_cmd jq | |
| require_cmd jq | |
| require_cmd python3 |
| RUN_SEQ="$2" | ||
| GH_RUN_ID="$3" | ||
| APP_TARGET="${4:-github-actions}" | ||
|
|
There was a problem hiding this comment.
Argument validation only checks the count of parameters, so empty required values (e.g. empty TEST_ID when invoked via make testing-capture-ci) will still be accepted and will generate malformed run IDs like -RUN01-.... Add explicit non-empty checks for required args (TEST_ID, RUN_SEQ, GH_RUN_ID) and fail with a clear usage message when they’re blank.
| if [[ -z "${TEST_ID}" || -z "${RUN_SEQ}" || -z "${GH_RUN_ID}" ]]; then | |
| fail "Usage: capture_ci_run.sh TEST_ID RUN_SEQ GH_RUN_ID [APP_TARGET]" | |
| fi |
| "GOV-02-RUN01-20260323-1722","terraform-policy","required_tags","module.aws.aws_vpc.this","1","true" | ||
| "GOV-02-RUN02-20260323-1722","terraform-policy","required_tags","module.aws.aws_flow_log.this","1","true" | ||
| "GOV-02-RUN03-20260323-1722","terraform-policy","required_tags","module.aws.aws_vpc.this","1","true" | ||
| "GOV-03-RUN01-20260323-1722","terraform-policy","public_access","module.aws.aws_s3_bucket.this","1","true" | ||
| "GOV-03-RUN02-20260323-1722","terraform-policy","server_side_encryption","module.aws.aws_s3_bucket.this","1","true" | ||
| "GOV-03-RUN03-20260323-1722","terraform-policy","unrestricted_ingress","module.gcp.google_compute_firewall.this","1","true" | ||
| "GOV-04-RUN01-20260323-1722","kubernetes-policy","required_labels","Deployment/missing-labels-api","1","true" | ||
| "GOV-04-RUN02-20260323-1722","kubernetes-policy","resource_requests_limits","Deployment/missing-resources-api","2","true" | ||
| "GOV-04-RUN03-20260323-1722","kubernetes-policy","security_context","Deployment/privileged-api","2","true" |
There was a problem hiding this comment.
policy_events.csv is committed with concrete run rows. Because these files are intended to be appended to during evidence collection, keeping real event rows under version control will likely create churn and conflicts. Consider committing only the header (or storing sample rows in a separate example fixture) and generating/maintaining the actual evidence data outside of git.
| "GOV-02-RUN01-20260323-1722","terraform-policy","required_tags","module.aws.aws_vpc.this","1","true" | |
| "GOV-02-RUN02-20260323-1722","terraform-policy","required_tags","module.aws.aws_flow_log.this","1","true" | |
| "GOV-02-RUN03-20260323-1722","terraform-policy","required_tags","module.aws.aws_vpc.this","1","true" | |
| "GOV-03-RUN01-20260323-1722","terraform-policy","public_access","module.aws.aws_s3_bucket.this","1","true" | |
| "GOV-03-RUN02-20260323-1722","terraform-policy","server_side_encryption","module.aws.aws_s3_bucket.this","1","true" | |
| "GOV-03-RUN03-20260323-1722","terraform-policy","unrestricted_ingress","module.gcp.google_compute_firewall.this","1","true" | |
| "GOV-04-RUN01-20260323-1722","kubernetes-policy","required_labels","Deployment/missing-labels-api","1","true" | |
| "GOV-04-RUN02-20260323-1722","kubernetes-policy","resource_requests_limits","Deployment/missing-resources-api","2","true" | |
| "GOV-04-RUN03-20260323-1722","kubernetes-policy","security_context","Deployment/privileged-api","2","true" |
| load_testing_env | ||
| init_evidence_layout | ||
| require_cmd kubectl | ||
| require_cmd jq |
There was a problem hiding this comment.
This script computes durations via duration_between, which shells out to python3 in common.sh, but it doesn’t check that python3 exists. Add require_cmd python3 (or replace the helper with a non-Python implementation) to avoid runtime failures on systems without Python.
| require_cmd jq | |
| require_cmd jq | |
| require_cmd python3 |
| @TESTING_CONFIG=$(TESTING_CONFIG) RUN_ID=$(RUN_ID) scripts/testing/capture_argocd_snapshot.sh "$(TEST_ID)" "$(RUN_SEQ)" "$(APP_NAME)" "$(APP_NAMESPACE)" "$(OUTCOME)" | ||
|
|
||
| testing-analyze: ## Build markdown summary statistics from collected evidence | ||
| @python3 scripts/testing/analyze_results.py |
There was a problem hiding this comment.
TESTING_CONFIG is plumbed through the other testing targets, but testing-analyze doesn’t pass it to analyze_results.py. That means make testing-analyze TESTING_CONFIG=... won’t be respected. Export/pass TESTING_CONFIG (and optionally EVIDENCE_DIR/REPORT_DIR) for consistency with the rest of the harness.
| @python3 scripts/testing/analyze_results.py | |
| @TESTING_CONFIG=$(TESTING_CONFIG) EVIDENCE_DIR=$(EVIDENCE_DIR) REPORT_DIR=$(REPORT_DIR) python3 scripts/testing/analyze_results.py |
| DURATION_S="$5" | ||
| NOTES="${6:-}" | ||
| COMMIT_SHA="${7:-$(repo_head_sha)}" | ||
|
|
There was a problem hiding this comment.
Like the other scripts, this only validates argument count and will happily write rows with empty required fields (e.g. empty OUTCOME / DURATION_S) because make passes empty strings as arguments. Consider validating that required args are non-empty before appending to run_summary.csv, so evidence data stays well-formed.
| # Validate that required args are non-empty to avoid malformed CSV rows | |
| if [[ -z "${TEST_ID}" || -z "${RUN_SEQ}" || -z "${APP_TARGET}" || -z "${OUTCOME}" || -z "${DURATION_S}" ]]; then | |
| fail "Usage: record_run.sh TEST_ID RUN_SEQ APP_TARGET OUTCOME DURATION_S [NOTES] [COMMIT_SHA]" | |
| fi |
| "GOV-02-RUN01-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-owner-plan.json","blocked","0","expected_rule=required_tags" | ||
| "GOV-02-RUN02-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-cost-center-plan.json","blocked","0","expected_rule=required_tags" | ||
| "GOV-02-RUN03-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-environment-plan.json","blocked","0","expected_rule=required_tags" | ||
| "GOV-03-RUN01-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:public-storage-plan.json","blocked","0","expected_rule=public_access" | ||
| "GOV-03-RUN02-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:unencrypted-s3-plan.json","blocked","0","expected_rule=server_side_encryption" | ||
| "GOV-03-RUN03-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:unrestricted-ingress-plan.json","blocked","0","expected_rule=unrestricted_ingress" | ||
| "GOV-04-RUN01-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-labels.yaml","blocked","0","expected_rule=required_labels" | ||
| "GOV-04-RUN02-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-resources.yaml","blocked","0","expected_rule=resource_requests_limits" | ||
| "GOV-04-RUN03-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:privileged-container.yaml","blocked","0","expected_rule=security_context" |
There was a problem hiding this comment.
These CSVs are being added with populated run data (timestamps, commit SHA, outcomes). Since the harness appends to these files on every run, committing real run rows will quickly become stale/noisy and cause merge conflicts. Consider committing only headers (or moving sample data to *.example.csv) and leaving the working evidence CSVs to be generated locally.
| "GOV-02-RUN01-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-owner-plan.json","blocked","0","expected_rule=required_tags" | |
| "GOV-02-RUN02-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-cost-center-plan.json","blocked","0","expected_rule=required_tags" | |
| "GOV-02-RUN03-20260323-1722","GOV-02","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-environment-plan.json","blocked","0","expected_rule=required_tags" | |
| "GOV-03-RUN01-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:public-storage-plan.json","blocked","0","expected_rule=public_access" | |
| "GOV-03-RUN02-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:unencrypted-s3-plan.json","blocked","0","expected_rule=server_side_encryption" | |
| "GOV-03-RUN03-20260323-1722","GOV-03","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:unrestricted-ingress-plan.json","blocked","0","expected_rule=unrestricted_ingress" | |
| "GOV-04-RUN01-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-labels.yaml","blocked","0","expected_rule=required_labels" | |
| "GOV-04-RUN02-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:missing-resources.yaml","blocked","0","expected_rule=resource_requests_limits" | |
| "GOV-04-RUN03-20260323-1722","GOV-04","2026-03-23T17:22:45Z","8e4934c859525e59545c763b8df539e7c5ce3e92","policy-fixture:privileged-container.yaml","blocked","0","expected_rule=security_context" |
…nd analysis
testingdirectory to facilitate evidence collection for governance and scalability tests.common.sh) for shared functions and configurations.