martinfrancois · martinfrancois · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026 · Jun 8, 2026
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -46,21 +46,24 @@ Checks most contributors can run:
 
 - [ ] `python3 scripts/validate_skill.py skills/java-optionals`
 - [ ] `python3 scripts/validate_eval_criteria.py evals evals-reference evals-regression`
-- [ ] `python3 -m py_compile scripts/validate_skill.py scripts/validate_eval_criteria.py`
-- [ ] `bash -n scripts/check_publish_dry_run.sh`
+- [ ] `python3 -m py_compile scripts/*.py`
+- [ ] `bash -n scripts/*.sh`
 - [ ] `tessl plugin lint .`
-- [ ] `markdownlint`, if Markdown changed
 - [ ] Manual rendered-doc or example review, if docs or examples changed
 
 Tessl-authenticated checks:
 
 - [ ] `bash scripts/check_publish_dry_run.sh .`
 - [ ] `tessl plugin publish --dry-run --bump patch .`
 - [ ] `tessl skill review --threshold 100 skills/java-optionals/SKILL.md`, if skill text or references changed
-- [ ] Targeted `tessl eval run --variant with-context --variant without-context <scenario-dir>`, if
-      skill behavior or evals changed
-- [ ] Full/main `tessl eval run --variant with-context --variant without-context .`, if benchmark
-      claims changed or targeted with-context results are clean
+- [ ] Targeted main/reference `scripts/run_eval_suite.sh <main|reference> <scenario-name>`, if skill behavior or those evals changed
+- [ ] Targeted regression `scripts/run_eval_suite.sh regression <scenario-name>`, if regression evals changed
+- [ ] Every substantively changed eval scenario was rerun targeted and reached 100% with context, or the PR explains the Tessl blocker and remaining work
+- [ ] Runtime skill/reference changes only: full `scripts/run_eval_suite.sh reference` was run after the final runtime-context change, or the PR links the blocker issue
+- [ ] Runtime skill/reference changes only: full `scripts/run_eval_suite.sh regression` was run after the final runtime-context change, or the PR links the blocker issue
+- [ ] Pure eval suite moves did not change task wording, scoring criteria, or capability text beyond suite-placement metadata/numbering notes
+- [ ] `scripts/classify_eval_result.py <run-json> --scenario-dir <scenario-dir>`, if a scenario was added or moved between suites
+- [ ] Full/main `scripts/run_eval_suite.sh main`, if benchmark claims changed or targeted with-context results are clean
 
 `bash scripts/check_publish_dry_run.sh .`, `tessl skill review`, and hosted Tessl evals require
 Tessl authentication. Hosted evals also require a linked Tessl project. If you can't run one of
@@ -83,30 +86,17 @@ explain why.
 
 ## Review Checklist
 
-- [ ] Docs updated, or N/A
-- [ ] Evals updated, or N/A
-- [ ] Scenario directories include `task.md`, `criteria.json`, and `capability.txt`, or N/A
-- [ ] Scenario invocation style is classified as natural or explicit, or N/A
-- [ ] Natural activation prompts don't explicitly invoke the skill, or N/A
-- [ ] Explicit invocation prompts are labeled as explicit, or N/A
-- [ ] Main eval criteria include compile/artifact checks, or N/A
-- [ ] Main eval criteria include behavior correctness checks, or N/A
-- [ ] Runtime references contain no eval answer keys, scenario inventory, hosted run IDs, or fixed
-      score claims
-- [ ] If any with-context result was below 100%, targeted failing scenarios were fixed and rerun
-      before broader eval suites
-- [ ] Java baseline compatibility has been considered, or N/A
-- [ ] `OptionalInt`, `OptionalLong`, and `OptionalDouble` guidance has been considered, or N/A
-- [ ] Optional-producing stream terminals and collectors are covered, or N/A
-- [ ] Java 26 Javadocs were checked for Optional-family coverage, or N/A
-- [ ] Valid README package-runner instructions were preserved, or N/A
-- [ ] Tessl package commands match the verified plugin package format
-- [ ] Full/reference eval reporting is not hidden or cherry-picked
-- [ ] Tessl checks were run, or unavailability is documented
-- [ ] PR title or squash title uses Conventional Commits
-- [ ] Redaction checked: no Tessl tokens, GitHub tokens, package manager tokens, private repository
-      links, private eval artifacts, private registry/workspace links, local host paths, or
-      proprietary Java source
+- [ ] The change is scoped to the sections, skill files, evals, or workflows described above.
+- [ ] Validation that applies to this change is checked above, or any unavailable check is explained.
+- [ ] If Java Optional guidance changed, Java baseline compatibility, fallback timing, null interop, primitive Optionals, and checked boundaries were considered.
+- [ ] If evals or benchmark claims changed, the eval scenarios remain fair and do not leak answer keys, run IDs, or fixed score claims into runtime references.
+- [ ] If runtime skill text or references changed, hosted checks were widened from targeted affected scenarios to main/reference/regression as described in `docs/agents/workflow.md`, or any Tessl blocker is documented.
+- [ ] If a runtime skill/reference change was released, the final report includes the published main eval run plus post-change reference and regression run IDs, or a blocker issue for missing broad suites.
+- [ ] Main and reference evals were run with both variants when hosted evals were needed; regression evals were run with context only unless reclassification back to reference was being checked.
+- [ ] New or moved eval scenarios follow the classifier recommendation, or the PR explains the maintainer-approved override.
+- [ ] Every retained eval scenario has a 100% with-context result, or any below-100 result is documented as blocking follow-up rather than classified/reportable coverage.
+- [ ] PR title or squash title uses Conventional Commits.
+- [ ] Redaction checked: no tokens, private links, private eval artifacts, local host paths, or proprietary Java source.
 
 ## AI Assistance (if used)
 

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -21,12 +21,12 @@ jobs:
     name: Validate skill and plugin
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
 
       - name: Setup Tessl CLI
         uses: tesslio/setup-tessl@25ec223fc0da33b41b8044ff5ab2b85235f4f91e # v2
         with:
-          version: "0.81.2"
+          version: "0.82.0"
 
       - name: Validate skill metadata
         run: python3 scripts/validate_skill.py skills/java-optionals
@@ -35,34 +35,16 @@ jobs:
         run: python3 scripts/validate_eval_criteria.py evals evals-reference evals-regression
 
       - name: Compile validation scripts
-        run: python3 -m py_compile scripts/validate_skill.py scripts/validate_eval_criteria.py
+        run: python3 -m py_compile scripts/*.py
 
       - name: Check shell scripts
-        run: bash -n scripts/check_publish_dry_run.sh
+        run: bash -n scripts/*.sh
 
       - name: Parse JSON files
-        run: |
-          python3 - <<'PY'
-          import json
-          import pathlib
-          for path in pathlib.Path('.').rglob('*.json'):
-              json.load(open(path, encoding='utf-8'))
-          print('JSON ok')
-          PY
-
-      - name: Parse YAML files
-        run: |
-          python3 - <<'PY'
-          import pathlib
-          try:
-              import yaml
-          except ImportError:
-              print('PyYAML unavailable; skipping YAML parse')
-              raise SystemExit(0)
-          for path in list(pathlib.Path('.').rglob('*.yml')) + list(pathlib.Path('.').rglob('*.yaml')):
-              yaml.safe_load(open(path, encoding='utf-8'))
-          print('YAML ok')
-          PY
+        run: python3 scripts/validate_json_files.py
+
+      - name: Validate YAML metadata
+        run: python3 scripts/validate_openai_agent_yaml.py
 
       - name: Lint Tessl plugin
         run: tessl plugin lint .
@@ -72,25 +54,19 @@ jobs:
     if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
 
       - name: Check Tessl token
         id: tessl-token
         env:
           TESSL_TOKEN: ${{ secrets.TESSL_TOKEN }}
-        run: |
-          if [ -n "${TESSL_TOKEN:-}" ]; then
-            echo "available=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "available=false" >> "$GITHUB_OUTPUT"
-            echo "TESSL_TOKEN isn't configured; skipping Tessl publish dry-runs."
-          fi
+        run: scripts/check_tessl_token_available.sh
 
       - name: Setup Tessl CLI
         if: ${{ steps.tessl-token.outputs.available == 'true' }}
         uses: tesslio/setup-tessl@25ec223fc0da33b41b8044ff5ab2b85235f4f91e # v2
         with:
-          version: "0.81.2"
+          version: "0.82.0"
           token: ${{ secrets.TESSL_TOKEN }}
 
       - name: Check fast publish dry-run

diff --git a/.github/workflows/commitlint.yml b/.github/workflows/commitlint.yml
@@ -30,7 +30,7 @@ jobs:
     name: Commitlint
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
         with:
           fetch-depth: 0
           ref: ${{ github.event.pull_request.head.sha || github.sha }}
@@ -40,17 +40,7 @@ jobs:
           node-version: "24"
 
       - name: Prepare commitlint
-        run: |
-          set -euo pipefail
-          commitlint_home="$RUNNER_TEMP/commitlint"
-          mkdir -p "$commitlint_home"
-          printf '{"private":true}\n' > "$commitlint_home/package.json"
-          cp commitlint.config.cjs "$commitlint_home/commitlint.config.cjs"
-          npm --prefix "$commitlint_home" install --silent \
-            @commitlint/cli@21.0.1 \
-            @commitlint/config-conventional@21.0.1
-          echo "COMMITLINT_BIN=$commitlint_home/node_modules/.bin/commitlint" >> "$GITHUB_ENV"
-          echo "COMMITLINT_CONFIG=$commitlint_home/commitlint.config.cjs" >> "$GITHUB_ENV"
+        run: scripts/install_commitlint.sh
 
       - name: Lint pull request title
         env:
@@ -63,12 +53,4 @@ jobs:
           PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
           BASE_REF: ${{ inputs.base_ref || 'main' }}
           EVENT_NAME: ${{ github.event_name }}
-        run: |
-          if [ "$EVENT_NAME" = "pull_request" ]; then
-            "$COMMITLINT_BIN" --config "$COMMITLINT_CONFIG" \
-              --from "$PR_BASE_SHA" --to "$PR_HEAD_SHA" --verbose
-          else
-            git fetch origin "$BASE_REF"
-            "$COMMITLINT_BIN" --config "$COMMITLINT_CONFIG" \
-              --from "origin/$BASE_REF" --to HEAD --verbose
-          fi
+        run: scripts/lint_pr_commits.sh
diff --git a/.github/workflows/publish-tessl.yml b/.github/workflows/publish-tessl.yml
@@ -4,17 +4,22 @@ on:
   workflow_dispatch:
     inputs:
       ref:
-        description: "Git ref to publish. Defaults to the selected workflow ref."
-        required: false
+        description: "Git ref to publish. Use refs/tags/v<plugin-version> for releases."
+        required: true
         type: string
+      allow_non_tag_ref:
+        description: "Allow publishing a non-v<plugin-version> ref. Use only for maintainer-approved recovery."
+        required: false
+        type: boolean
+        default: false
   release:
     types: [published]
 
 permissions:
   contents: read
 
 concurrency:
-  group: publish-tessl-${{ github.event.release.tag_name || github.run_id }}
+  group: publish-tessl-${{ github.event.release.tag_name || inputs.ref || github.ref }}
   cancel-in-progress: false
 
 jobs:
@@ -25,40 +30,45 @@ jobs:
     steps:
       - name: Checkout manual ref
         if: ${{ github.event_name == 'workflow_dispatch' }}
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+        uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
         with:
-          ref: ${{ inputs.ref || github.ref }}
+          ref: ${{ inputs.ref }}
 
       - name: Checkout release tag
         if: ${{ github.event_name == 'release' }}
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+        uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6
         with:
           ref: ${{ github.event.release.tag_name }}
 
+      - name: Show publish ref
+        run: |
+          echo "event_name=${{ github.event_name }}"
+          echo "requested_ref=${{ github.event.release.tag_name || inputs.ref }}"
+          echo "checked_out_ref=$(git rev-parse --abbrev-ref HEAD)"
+          echo "checked_out_sha=$(git rev-parse HEAD)"
+          git describe --tags --always --dirty
+
+      - name: Validate publish ref
+        env:
+          EVENT_NAME: ${{ github.event_name }}
+          RELEASE_TAG: ${{ github.event.release.tag_name }}
+          MANUAL_REF: ${{ inputs.ref }}
+          ALLOW_NON_TAG_REF: ${{ inputs.allow_non_tag_ref }}
+        run: scripts/validate_publish_ref.sh
+
       - name: Require Tessl token
         env:
           TESSL_TOKEN: ${{ secrets.TESSL_TOKEN }}
-        run: |
-          if [ -z "${TESSL_TOKEN:-}" ]; then
-            echo "TESSL_TOKEN is required to publish the Tessl plugin." >&2
-            exit 1
-          fi
+        run: scripts/require_tessl_token.sh
 
       - name: Setup Tessl CLI
         uses: tesslio/setup-tessl@25ec223fc0da33b41b8044ff5ab2b85235f4f91e # v2
         with:
-          version: "0.81.2"
+          version: "0.82.0"
           token: ${{ secrets.TESSL_TOKEN }}
 
       - name: Validate plugin before publish
-        run: |
-          python3 scripts/validate_skill.py skills/java-optionals
-          python3 scripts/validate_eval_criteria.py evals evals-reference evals-regression
-          python3 -m py_compile scripts/validate_skill.py scripts/validate_eval_criteria.py
-          bash -n scripts/check_publish_dry_run.sh
-          tessl plugin lint .
-          tessl skill review --threshold 100 skills/java-optionals/SKILL.md
-          tessl plugin publish --dry-run .
+        run: scripts/validate_publish_ready.sh
 
       - name: Publish plugin
         run: tessl plugin publish .