chore: remove silent-failure workarounds; add forbid-suppressions guard by mvillmow · Pull Request #549 · HomericIntelligence/ProjectKeystone

mvillmow · 2026-05-11T03:26:39Z

Summary

Ports the regression guard from HomericIntelligence/Odysseus#280 and refactors all 27 || true occurrences plus 10 continue-on-error: true suppressions across this repo per Bucket A–E classification.

Two pygrep hooks are added to .pre-commit-config.yaml and a forbid-suppressions job is prepended to .github/workflows/_required.yml so the pattern cannot return.

Refactors

`|| true` (27 occurrences, 6 shell files + 2 workflow files)

File:Line	Bucket	Refactor
`examples/multi_process_agents/scripts/launch_system.sh:9-12`	B	`pkill` loop with rc capture; warn on rc>=2 (rc=1 = no match, expected)
`examples/multi_process_agents/scripts/launch_system.sh:54`	B	`kill` loop guarded by `kill -0`; stderr warn on real failure
`examples/multi_process_agents/scripts/launch_system.sh:55`	B	`wait` loop with rc capture; signal-termination accepted
`scripts/create_phase_issues.sh:164`	A	`gh api ... \|\| true` → `if ! gh api ...; then warn; return 1; fi` (root cause: must report API failure)
`scripts/hmas-server.sh:120-121`	B	`kill` loop guarded by `kill -0`; stderr warn on real failure
`scripts/run_static_analysis.sh:85`	B	`clang-tidy` per-file aggregator: rc captured, warn only on rc>1; `grep -c \|\| echo 0` → awk counters (Bucket D)
`scripts/test_docker.sh:145`	B	`docker-compose down 2>/dev/null \|\| true` → `if ! docker-compose down 2>/dev/null; then warn; fi`
`scripts/test_docker.sh:177-184` (5 occurrences)	B	`run_test ... \|\| true` → `run_test ... && _rc=0 \|\| _rc=$?` so `set -e` does not abort the aggregator
`scripts/test_docker.sh:45,49`	C	`((TESTS_PASSED++))` / `((TESTS_FAILED++))` → `TESTS_PASSED=$((TESTS_PASSED + 1))` (pre-increment from 0 returns 1 under `set -e`)
`scripts/verify_install.sh:265`	B	`cd "$PROJECT_ROOT" \|\| true` → guard with `[[ -n PROJECT_ROOT && -d $PROJECT_ROOT ]]` before `cd`
`scripts/verify_install.sh:280-286` (7 occurrences)	B	`run_check ... \|\| true` → `run_check ... && _rc=0 \|\| _rc=$?` aggregator pattern
`scripts/verify_install.sh:43,47`	C	`((TESTS_PASSED++))` / `((TESTS_FAILED++))` → `$((var + 1))`
`.github/workflows/_required.yml:65` (CMake graphviz)	A	`cmake ... \|\| true` → `if ! cmake ...; then warn; fi` (next step gates on `[ -f deps.dot ]`)
`.github/workflows/_required.yml:177` (clang-tidy grep tail)	D	`grep -E \| grep -v ... \|\| true` → single `awk` filter (always rc=0)
`.github/workflows/_required.yml:601` (pip-audit)	A	`pip-audit --strict \|\| true` → `if ! pip-audit --strict; then warn; fi`
`.github/workflows/extras.yml:69` (benchmarks)	A	`make benchmark.native \|\| true` → `if ! make benchmark.native; then warn; fi`

`continue-on-error: true` (10 occurrences, all in `.github/workflows/_required.yml`)

Line	Bucket	Refactor
164 (Build with clang-tidy)	E	Replaced step-level suppression with shell-level rc capture into `clang-tidy-build.rc`; next step inspects build_rc and emits a warning if non-zero but no source-file errors
612, 623 (Trivy FS SARIF / JSON)	E	Removed — Trivy already has `exit-code: "0"`; the gate step downstream decides job pass/fail from the JSON
645 (Docker build for scanning)	E	`docker build` wrapped in `if`-then; writes `built=true/false` to `$GITHUB_OUTPUT`; downstream `if: steps.docker_build.outputs.built == 'true'` instead of `outcome == 'success'`
659, 669 (Trivy container SARIF / JSON)	E	Added `exit-code: "0"` and removed `continue-on-error` (was masking, now gated by `built == 'true'`)
746 (Gitleaks)	E	Removed — gitleaks already uses `--exit-code 0`; central gate step decides pass/fail
763 (Semgrep)	E	Replaced action invocation with shell `pip install + semgrep scan` that captures rc into `$GITHUB_OUTPUT`; warning emitted on non-zero
781, 796 (CodeQL c-cpp install deps / build)	E	Both wrapped to capture rc into step outputs; build step gated on `codeql_deps.outputs.rc == '0'`; warnings on non-zero

Lint guard

.pre-commit-config.yaml: appended forbid-or-true and forbid-continue-on-error pygrep hooks (preserving all existing hooks).
.github/workflows/_required.yml: added forbid-suppressions job at the top (runs before lint); self-exemption for _required.yml and (future) docs/runbooks/no-silent-failures.md.

Verification

bash -n on every modified .sh: pass.
shellcheck -S error on every modified .sh: pass (only SC2329 info and SC2155 warn, both pre-existing).
python3 -c "import yaml; yaml.safe_load(...)" on every modified YAML: pass.
pre-commit run forbid-or-true --all-files: pass.
pre-commit run forbid-continue-on-error --all-files: pass.
Final grep for \|\|\s*true(\s*$|\s+#) and ^\s*continue-on-error:\s*true\s*$ across .sh/.bash/.yml/.yaml/.hcl/Dockerfile*/justfile: zero hits.

Full pre-commit run --all-files could not be executed locally because this repo pins default_language_version: python: python3.12 and only python3.9.2 is available in the local environment — the new pygrep hooks themselves do not need a Python interpreter and were exercised individually with success above. The remote forbid-suppressions CI job is the authoritative backstop.

Pre-existing issues observed (out of scope)

.yamllint.yaml warns on several pre-existing long lines (grep -c ... \|\| echo 0 style jq one-liners on lines 795-800 of _required.yml). These are formatting warnings, not silent failures (|| echo 0 is not in scope of the lint guard's regex). Left as-is.
scripts/test_docker.sh and scripts/verify_install.sh both define run_test/run_check helpers that increment counters via ((var++)) — fixed as part of Bucket C since they sit under set -e.

Test plan

CI forbid-suppressions job passes on the PR (proves no regression).
lint job still passes (CMake graphviz cycle check, clang-tidy gate, mypy).
security-dependency-scan job still passes (pip-audit warning, Trivy gating intact).
security/secrets-scan job still passes (gitleaks gate, semgrep SARIF upload, CodeQL c-cpp + python).

🤖 Generated with Claude Code

Ports the regression guard from HomericIntelligence/Odysseus#280 and refactors all 27 || true occurrences plus 10 continue-on-error: true suppressions per Bucket A–E classification. - Bucket A (masks real failures): pip-audit, benchmarks, gh-api milestone creation, cmake graphviz configure, clang-tidy build, semgrep, codeql c-cpp build deps + build — surfaced as explicit warnings/gates instead of silent. - Bucket B (best-effort cleanup): pkill, kill, wait, docker-compose down, clang-tidy per-file aggregator — wrapped in kill -0 guards or rc capture with stderr warnings for unexpected (rc>=2) failures. - Bucket C ((counter++) under set -e): converted TESTS_PASSED++ / TESTS_FAILED++ in test_docker.sh and verify_install.sh to $((var + 1)) so the first increment from 0 does not trip set -e. - Bucket D (pipeline-tail grep): replaced grep -c with awk-based counters in run_static_analysis.sh; replaced grep | grep filter in clang-tidy gate with a single awk script (always exits 0). - Bucket E (continue-on-error: true): removed all 10 occurrences. Trivy steps now rely on exit-code: "0"; docker build / CodeQL deps / CodeQL build / Semgrep capture rc into step outputs so downstream steps gate explicitly; gitleaks already used --exit-code 0 so the suppression was pure redundancy. Adds .pre-commit-config.yaml hooks forbid-or-true / forbid-continue-on-error and a forbid-suppressions job at the top of .github/workflows/_required.yml so the pattern cannot return. Local verification: - bash -n on every modified .sh: pass - shellcheck -S error on every modified .sh: pass (only SC2329/SC2155 info/warn) - python3 -c "import yaml; yaml.safe_load(...)" on every modified YAML: pass - pre-commit run forbid-or-true --all-files: pass - pre-commit run forbid-continue-on-error --all-files: pass - grep for || true and continue-on-error: true post-refactor: zero hits Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-advanced-security · 2026-05-11T03:47:33Z

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

mvillmow enabled auto-merge (squash) May 11, 2026 03:26

mvillmow merged commit 4963e58 into main May 11, 2026
22 checks passed

mvillmow deleted the chore/remove-silent-failure-workarounds branch May 11, 2026 03:57

mvillmow mentioned this pull request May 12, 2026

chore: easy-issue sweep 2026-05-11 #552

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: remove silent-failure workarounds; add forbid-suppressions guard#549

chore: remove silent-failure workarounds; add forbid-suppressions guard#549
mvillmow merged 1 commit into
mainfrom
chore/remove-silent-failure-workarounds

mvillmow commented May 11, 2026

Uh oh!

github-advanced-security AI commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mvillmow commented May 11, 2026

Summary

Refactors

|| true (27 occurrences, 6 shell files + 2 workflow files)

continue-on-error: true (10 occurrences, all in .github/workflows/_required.yml)

Lint guard

Verification

Pre-existing issues observed (out of scope)

Test plan

Uh oh!

github-advanced-security AI commented May 11, 2026

What Enabling Code Scanning Means:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`|| true` (27 occurrences, 6 shell files + 2 workflow files)

`continue-on-error: true` (10 occurrences, all in `.github/workflows/_required.yml`)