Skip to content

Audit remediation: quality gates with teeth, pinned toolchain, content and runtime fixes#6

Open
adewale wants to merge 6 commits into
mainfrom
claude/quirky-bell-8fqrne
Open

Audit remediation: quality gates with teeth, pinned toolchain, content and runtime fixes#6
adewale wants to merge 6 commits into
mainfrom
claude/quirky-bell-8fqrne

Conversation

@adewale

@adewale adewale commented Jun 12, 2026

Copy link
Copy Markdown
Owner

What

Fixes the 2026-06-11 audit findings and the follow-up review gaps on PR #6. The branch now ships the content/runtime hardening, removes the migration-era golden fixture, moves editorial registries into TOML, and closes the remaining security/UI/routing/plumbing items found after the first pass.

Why

The original green suite still missed several defect classes: tautological generated fixtures, stale editorial data embedded in Python, weak quality gates, cache/body-size edge cases in the Worker path, incomplete security headers, low-contrast Run-button styling, stale PR documentation, and an avoidable double GET-routing layer in src/main.py.

How

  • Replaced the golden migration snapshot/parity scripts with live verifiers, generated-source checks, quality gates, and browser/runtime checks.
  • Moved journeys, captions, figure attachments, curated scores, and see-also labels into docs/quality-registries.toml; generated src/editorial_registry_data.py keeps the Worker bundle self-contained.
  • Hardened quality gates: paired pages require teaching-cell tokens, confusable pairs count teaching cells only, scope-first-pass requires registry-named focused neighbors, and shared script plumbing now comes from scripts/_common.py.
  • Hardened the Worker/runtime path: normalized HTML cache keys, pre-ASGI POST body cap, HTTPS-only smoke bypass, HMAC secret comparison, fail-closed Turnstile modes, and safer template/script escaping.
  • Added full response/static security headers: CSP, HSTS, frame denial, referrer policy, and nosniff via src/security.py and generated public/_headers.
  • Fixed the UI audit items: WCAG-AA Run-button/accent contrast and an accessible editor label.
  • Removed FastAPI GET delegation to app.route(): src/main.py now calls the page renderers directly for home, layout options, journeys, examples, and 404s while src/app.py keeps the pure renderer helpers for tests and non-ASGI use.

Testing

Local verification on the pushed branch:

  • make verify with local pywrangler dev on port 9696 — passes end-to-end, including build, generated checks, 124 unittest cases, all quality gates, browser layout, Ruff, and generated-file idempotence.
  • make test — 124 tests pass.
  • make quality-checks — all gates pass.
  • make verify-python-version VERSION=3.13 — 109 examples verified.
  • scripts/format_examples.py --check — formatted.
  • make rubric-audit — no gate-blocking findings.
  • uv run ruff check src tests scripts — clean.
  • git diff --check — clean.
  • Local Worker startup and scripts/check_browser_layout.mjs — Shiki/code-block layout passes.
  • Oversized POST smoke — returns 413 before FastAPI buffers the request body.
  • make build idempotence — no regenerated diff.

Screenshots / Visual Review

No image attachments in this environment. Visual changes are limited to contrast/accessibility and generated review pages:

  • Run-button/accent text contrast now uses AA-compliant colors in public/site.css.
  • Browser layout regression passed locally against a running Worker.
  • Figure/caption review artifacts are regenerated under public/prototyping/, especially public/prototyping/marginalia-gestalt.html and public/prototyping/production-figures-gestalt.html.

Risk

  • Large diff is mostly generated/deleted artifact churn: old golden fixture removal, embedded registry/source data, prototype pages, asset fingerprints, and generated headers.
  • CSP intentionally keeps style-src 'unsafe-inline' for CodeMirror/Shiki dynamic styles; script execution is controlled by nonce/CDN allowlists.
  • HSTS/frame-denial are intentionally stricter; any consumer embedding the site in an iframe would break.
  • The pure app.route() helper remains for compatibility/tests, but the Worker/FastAPI GET path no longer routes through it. A deeper app.py route cleanup can be a follow-up.
  • good-pr readiness was run. It flags the intentionally large/UI diff and a false-positive “secret” match from existing variable/context lines; no secret values are present in the diff.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE

claude added 5 commits June 11, 2026 00:48
Covers code, docs, CI, quality gates, example content, figures, and
tests, with verified findings and improvement suggestions.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE
- Route every Makefile python target through uv run --python 3.13 so
  make verify works regardless of system python; fix make dev/deploy to
  use the workers dependency group.
- Wire audit_example_graph and three forthcoming content gates into
  quality-checks.
- Git hooks now regenerate embedded example data as well as the asset
  manifest; merge=ours covers src/example_sources_data.py too.
- verify.yml: drop duplicate PR runs, add permissions and concurrency.
- preview.yml: pass workflow inputs via env (no shell interpolation next
  to the Cloudflare token), pin wrangler, pass PBE_SMOKE_BYPASS_SECRET.
- regenerate-generated-files.yml: pull before building and retry with
  rebuild on push races, so stale generated output cannot land on main.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE
New tools:
- scripts/_common.py shares the root/loader/registry plumbing.
- check_program_covers_cells: an executable cell wholly disjoint from
  the :::program block fails; standalone_cells frontmatter opts out
  visibly.
- check_prose_duplication: verbatim repeated paragraphs, cell prose
  copying the intro, and duplicate note bullets fail.
- check_inline_links: prose links must be real /examples or /journeys
  targets; render_inline (now in src/textfmt.py) renders them as
  anchors, shared with figure captions so backticks render as code.

Gate integrity:
- score_example_criteria fails when a curated score exceeds the
  heuristic by more than --max-delta (default 1.5).
- check_quality_scores: waiver expiry must be a future ISO date, stale
  waivers flagged, score ranges validated, journey_average_min enforced.
- check_confusable_pairs defends against substring shadowing and
  accepts regex patterns.
- check_registry_integrity enforces the paired_pages relationship via
  see_also; check_no_figure_rationales enforces review_after dates;
  check_notes_supported reads every :::note block.
- audit_rubric_snapshot computes its scoreboard, waiver lines, and
  findings from live registries and labels curated judgements instead
  of hardcoding PASS verdicts; --date defaults to today.
- lint_seo_cache covers editor.js; verify_examples reports slug/filename
  mismatches cleanly; audit_example_graph --check help is accurate.

Figures:
- New while-backedge figure so the while-loops banner actually draws
  the back-edge its caption describes; loop-shape caption rewritten to
  describe the three stopping rules.
- Canvas text is XML-escaped; new well-formedness and escaping
  contracts; anchor contract validates against rendered cells (the
  walkthrough list diverges on 10 examples); reverse section-figure
  containment; rendered-page figure-presence contract; mechanical
  caption contract (5c); args-kwargs dividers computed via mono_divider.
- The marginalia gestalt now renders one card per attachment with the
  production figcaption, so caption/figure disagreements are visible in
  review; prototypes regenerate in make build and check-generated
  covers public/prototyping plus untracked fingerprint copies.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE
Factual and code corrections:
- dicts: scope the RuntimeError claim to adding/removing keys (value
  reassignment is legal).
- type-hints: teach the PEP 695 type statement; keep TypeAlias as the
  labelled-deprecated legacy spelling.
- regular-expressions: stop promising compile-time speedups the re
  module's internal cache already provides.
- string-formatting: describe 05.1f as zero-padding to width five.
- constants: doc link points at typing.Final, matching the page.
- special-methods: __lt__ alone suffices for sorted(); warn that
  hashing mutable Bags makes them unfindable after mutation; fix the
  three /data-model/* links (plus async-await's /iteration/* one) to
  real /examples/* routes.
- operator-overloading: __add__/__eq__ return NotImplemented for
  foreign types as the note instructs, demonstrated with Vector == 5.

Demonstration gaps:
- async-await shows a plain sync def (the contrast the registry pair
  demands; caught by the hardened confusable-pairs gate).
- positional-only-parameters: clamp now visibly caps 12 to 10, and a
  new cell shows scale(value=4) raising TypeError; keyword-only gets
  the same enforcement demo.
- exceptions: broken/fixed parsers now diverge visibly on a buggy call.
- operators/numbers/decorators/casts-and-any: programs now contain the
  cell code they were missing (short-circuit demo, ratio use, __doc__,
  isinstance narrowing).

Honesty and structure:
- subprocesses/threads/networking unsupported-fragments say plainly
  that Run fails in the sandbox and outputs come from real CPython
  execution at build time.
- hello-world/truthiness/unpacking cells get cell-specific prose
  instead of copied intro paragraphs; sets drops a duplicate bullet.
- match-statements notes cover mapping extra-keys; bound-and-unbound
  notes name the Py2 history; generics/paramspec notes show PEP 695
  inline syntax; bytes-and-bytearray moves to the Text section;
  comprehensions prints the set through sorted().
- async-await figure anchor follows its cell to cell-1.

Golden fixture redesigned as an explicit structural snapshot:
refresh_golden_fixture.py regenerates it and prints a reviewable
structural summary (29 changed examples this refresh); the parity gate
now compares full cell structure including kinds through the loader
with no legacy walkthrough dependence.

Worker runtime and front-end hardening:
- worker_asgi_bridge: fix the onclose handler being assigned to onopen;
  encode/decode ASGI headers as latin-1 per spec; run one lifespan per
  app instead of startup/shutdown around every request.
- main.py: constant-time smoke-bypass comparison; unknown Turnstile
  modes fail closed to requiring a challenge; POST bodies over 100 kB
  rejected with a friendly 413 page; security headers (nosniff,
  referrer-policy, frame protections) on all Worker HTML; error
  responses on cacheable paths marked no-store.
- app.py: single-pass template substitution so user code containing
  __TOKEN__ text cannot corrupt the page; escape < in embedded JSON so
  </script> in example code cannot break out of the inline script;
  honest build_dynamic_worker_code docstring.
- example.html textarea gains an accessible label; site.css adds
  --accent-text (#C83800, 4.6:1) for small accent-colored text;
  screenshot script picks the Chrome path per platform.
- New behavioral test suite for clearance signing, challenge modes,
  cookie attributes, and cache keys (15 tests); execution test now
  asserts real expected output; quality-check tests cover the new gates
  plus negative cases proving each gate can fail.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE
- example-source-format-spec: implemented status header, checklists
  ticked as the historical record, :::unsupported and the
  expected_output/standalone_cells frontmatter documented, golden
  fixture policy rewritten for the structural-snapshot design, parity
  behavior matches the rewritten gate.
- example-figure-rubric: banner ceiling corrected to the rendered-width
  formula against 640px, grammar contracts relabelled to match the test
  suite (5/5b), new 5c caption contract and Contract 11 XML
  well-formedness documented.
- visual-explainer-spec and journey-visualisation-rubric: journey
  figures documented as the centered block between heading and list
  that production ships, not the 2-column layout that never did.
- example-quality-rubric: gate wording names the enforced 9.0 target /
  8.5 hard-minimum machinery; lessons-learned notes the fixture's
  post-migration role.
- turnstile spec: unknown challenge modes documented as fail-closed.
- example-graph doc: audit script moved from future work to its
  shipped, gated reality.
- README: features cover journeys, figures, Turnstile, and the quality
  gates; architecture lists the marginalia and textfmt modules; example
  workflow includes quality-checks and the fixture refresh.
- CONTRIBUTING: full 13-script gate table, an end-to-end new-example
  workflow, and a secrets/deploy-configuration section.
- CHANGELOG: Unreleased entry covering the toolchain, gates, runtime
  hardening, and content corrections.

https://claude.ai/code/session_018EpcfENTg7herPR712vZeE
@adewale adewale changed the title Audit fixes: content gates, specs, examples, and security headers Audit remediation: quality gates with teeth, pinned toolchain, content and runtime fixes Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants