Skip to content

chore: remove silent-failure workarounds; add forbid-suppressions guard#549

Merged
mvillmow merged 1 commit into
mainfrom
chore/remove-silent-failure-workarounds
May 11, 2026
Merged

chore: remove silent-failure workarounds; add forbid-suppressions guard#549
mvillmow merged 1 commit into
mainfrom
chore/remove-silent-failure-workarounds

Conversation

@mvillmow
Copy link
Copy Markdown
Collaborator

Summary

Ports the regression guard from HomericIntelligence/Odysseus#280 and refactors all 27 || true occurrences plus 10 continue-on-error: true suppressions across this repo per Bucket A–E classification.

Two pygrep hooks are added to .pre-commit-config.yaml and a forbid-suppressions job is prepended to .github/workflows/_required.yml so the pattern cannot return.

Refactors

|| true (27 occurrences, 6 shell files + 2 workflow files)

File:Line Bucket Refactor
examples/multi_process_agents/scripts/launch_system.sh:9-12 B pkill loop with rc capture; warn on rc>=2 (rc=1 = no match, expected)
examples/multi_process_agents/scripts/launch_system.sh:54 B kill loop guarded by kill -0; stderr warn on real failure
examples/multi_process_agents/scripts/launch_system.sh:55 B wait loop with rc capture; signal-termination accepted
scripts/create_phase_issues.sh:164 A gh api ... || trueif ! gh api ...; then warn; return 1; fi (root cause: must report API failure)
scripts/hmas-server.sh:120-121 B kill loop guarded by kill -0; stderr warn on real failure
scripts/run_static_analysis.sh:85 B clang-tidy per-file aggregator: rc captured, warn only on rc>1; grep -c || echo 0 → awk counters (Bucket D)
scripts/test_docker.sh:145 B docker-compose down 2>/dev/null || trueif ! docker-compose down 2>/dev/null; then warn; fi
scripts/test_docker.sh:177-184 (5 occurrences) B run_test ... || truerun_test ... && _rc=0 || _rc=$? so set -e does not abort the aggregator
scripts/test_docker.sh:45,49 C ((TESTS_PASSED++)) / ((TESTS_FAILED++))TESTS_PASSED=$((TESTS_PASSED + 1)) (pre-increment from 0 returns 1 under set -e)
scripts/verify_install.sh:265 B cd "$PROJECT_ROOT" || true → guard with [[ -n PROJECT_ROOT && -d $PROJECT_ROOT ]] before cd
scripts/verify_install.sh:280-286 (7 occurrences) B run_check ... || truerun_check ... && _rc=0 || _rc=$? aggregator pattern
scripts/verify_install.sh:43,47 C ((TESTS_PASSED++)) / ((TESTS_FAILED++))$((var + 1))
.github/workflows/_required.yml:65 (CMake graphviz) A cmake ... || trueif ! cmake ...; then warn; fi (next step gates on [ -f deps.dot ])
.github/workflows/_required.yml:177 (clang-tidy grep tail) D grep -E | grep -v ... || true → single awk filter (always rc=0)
.github/workflows/_required.yml:601 (pip-audit) A pip-audit --strict || trueif ! pip-audit --strict; then warn; fi
.github/workflows/extras.yml:69 (benchmarks) A make benchmark.native || trueif ! make benchmark.native; then warn; fi

continue-on-error: true (10 occurrences, all in .github/workflows/_required.yml)

Line Bucket Refactor
164 (Build with clang-tidy) E Replaced step-level suppression with shell-level rc capture into clang-tidy-build.rc; next step inspects build_rc and emits a warning if non-zero but no source-file errors
612, 623 (Trivy FS SARIF / JSON) E Removed — Trivy already has exit-code: "0"; the gate step downstream decides job pass/fail from the JSON
645 (Docker build for scanning) E docker build wrapped in if-then; writes built=true/false to $GITHUB_OUTPUT; downstream if: steps.docker_build.outputs.built == 'true' instead of outcome == 'success'
659, 669 (Trivy container SARIF / JSON) E Added exit-code: "0" and removed continue-on-error (was masking, now gated by built == 'true')
746 (Gitleaks) E Removed — gitleaks already uses --exit-code 0; central gate step decides pass/fail
763 (Semgrep) E Replaced action invocation with shell pip install + semgrep scan that captures rc into $GITHUB_OUTPUT; warning emitted on non-zero
781, 796 (CodeQL c-cpp install deps / build) E Both wrapped to capture rc into step outputs; build step gated on codeql_deps.outputs.rc == '0'; warnings on non-zero

Lint guard

  • .pre-commit-config.yaml: appended forbid-or-true and forbid-continue-on-error pygrep hooks (preserving all existing hooks).
  • .github/workflows/_required.yml: added forbid-suppressions job at the top (runs before lint); self-exemption for _required.yml and (future) docs/runbooks/no-silent-failures.md.

Verification

  • bash -n on every modified .sh: pass.
  • shellcheck -S error on every modified .sh: pass (only SC2329 info and SC2155 warn, both pre-existing).
  • python3 -c "import yaml; yaml.safe_load(...)" on every modified YAML: pass.
  • pre-commit run forbid-or-true --all-files: pass.
  • pre-commit run forbid-continue-on-error --all-files: pass.
  • Final grep for \|\|\s*true(\s*$|\s+#) and ^\s*continue-on-error:\s*true\s*$ across .sh/.bash/.yml/.yaml/.hcl/Dockerfile*/justfile: zero hits.

Full pre-commit run --all-files could not be executed locally because this repo pins default_language_version: python: python3.12 and only python3.9.2 is available in the local environment — the new pygrep hooks themselves do not need a Python interpreter and were exercised individually with success above. The remote forbid-suppressions CI job is the authoritative backstop.

Pre-existing issues observed (out of scope)

  • .yamllint.yaml warns on several pre-existing long lines (grep -c ... \|\| echo 0 style jq one-liners on lines 795-800 of _required.yml). These are formatting warnings, not silent failures (|| echo 0 is not in scope of the lint guard's regex). Left as-is.
  • scripts/test_docker.sh and scripts/verify_install.sh both define run_test/run_check helpers that increment counters via ((var++)) — fixed as part of Bucket C since they sit under set -e.

Test plan

  • CI forbid-suppressions job passes on the PR (proves no regression).
  • lint job still passes (CMake graphviz cycle check, clang-tidy gate, mypy).
  • security-dependency-scan job still passes (pip-audit warning, Trivy gating intact).
  • security/secrets-scan job still passes (gitleaks gate, semgrep SARIF upload, CodeQL c-cpp + python).

🤖 Generated with Claude Code

Ports the regression guard from HomericIntelligence/Odysseus#280
and refactors all 27 || true occurrences plus 10 continue-on-error: true
suppressions per Bucket A–E classification.

- Bucket A (masks real failures): pip-audit, benchmarks, gh-api milestone
  creation, cmake graphviz configure, clang-tidy build, semgrep, codeql c-cpp
  build deps + build — surfaced as explicit warnings/gates instead of silent.
- Bucket B (best-effort cleanup): pkill, kill, wait, docker-compose down,
  clang-tidy per-file aggregator — wrapped in kill -0 guards or rc capture
  with stderr warnings for unexpected (rc>=2) failures.
- Bucket C ((counter++) under set -e): converted TESTS_PASSED++ /
  TESTS_FAILED++ in test_docker.sh and verify_install.sh to
  $((var + 1)) so the first increment from 0 does not trip set -e.
- Bucket D (pipeline-tail grep): replaced grep -c with awk-based counters
  in run_static_analysis.sh; replaced grep | grep filter in clang-tidy
  gate with a single awk script (always exits 0).
- Bucket E (continue-on-error: true): removed all 10 occurrences. Trivy
  steps now rely on exit-code: "0"; docker build / CodeQL deps / CodeQL
  build / Semgrep capture rc into step outputs so downstream steps gate
  explicitly; gitleaks already used --exit-code 0 so the suppression was
  pure redundancy.

Adds .pre-commit-config.yaml hooks forbid-or-true / forbid-continue-on-error
and a forbid-suppressions job at the top of .github/workflows/_required.yml
so the pattern cannot return.

Local verification:
  - bash -n on every modified .sh: pass
  - shellcheck -S error on every modified .sh: pass (only SC2329/SC2155 info/warn)
  - python3 -c "import yaml; yaml.safe_load(...)" on every modified YAML: pass
  - pre-commit run forbid-or-true --all-files: pass
  - pre-commit run forbid-continue-on-error --all-files: pass
  - grep for || true and continue-on-error: true post-refactor: zero hits

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mvillmow mvillmow enabled auto-merge (squash) May 11, 2026 03:26
@github-advanced-security
Copy link
Copy Markdown

You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool.

What Enabling Code Scanning Means:

  • The 'Security' tab will display more code scanning analysis results (e.g., for the default branch).
  • Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results.
  • You will be able to see the analysis results for the pull request's branch on this overview once the scans have completed and the checks have passed.

For more information about GitHub Code Scanning, check out the documentation.

@mvillmow mvillmow merged commit 4963e58 into main May 11, 2026
22 checks passed
@mvillmow mvillmow deleted the chore/remove-silent-failure-workarounds branch May 11, 2026 03:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants