Skip to content

chore: easy-issue sweep 2026-05-11#502

Merged
mvillmow merged 18 commits into
mainfrom
chore/easy-sweep-2026-05-11
May 12, 2026
Merged

chore: easy-issue sweep 2026-05-11#502
mvillmow merged 18 commits into
mainfrom
chore/easy-sweep-2026-05-11

Conversation

@mvillmow
Copy link
Copy Markdown
Contributor

Bundled easy-issue sweep — see https://github.com/HomericIntelligence/Odysseus coordination thread.

Implemented

ALREADY-DONE (verified, will be closed manually after merge)

The following issues were verified ALREADY-DONE on cd73468 and could not be auto-closed (token lacks issues:write); they will be closed manually after this PR merges:

  • #149.github/workflows/publish-exporter-image.yml already publishes to GHCR.
  • #146.claude/settings.json already includes deny-list guardrails (rm -rf, force-push, etc.).
  • #170.github/workflows/ci.yml already validates every dashboards/*.json file via the wildcard loop, including jetstream-events.json.
  • #175 — Alertmanager rows added to CLAUDE.md stack table and Common Commands.
  • #176pixi.toml already declares python and the test/test-unit tasks.
  • #208docker-compose.yml already pins every image to a version tag (no :latest).
  • #238v0.1.0 and v0.2.0 tags already exist in this repo.
  • #247prom/alertmanager:v0.32.1 is already pinned in docker-compose.yml.
  • #374softprops/action-gh-release is no longer used in .github/workflows/release.yml (replaced by built-in gh release create).
  • #375CHANGELOG.md already has a [0.2.0] - 2026-05-06 section.
  • #409justfile header comment now documents GF_ADMIN_PASSWORD requirement; CLAUDE.md env-vars table also covers it.

Skipped / BLOCKED

  • #240 — split Atlas M1–M3 CHANGELOG entry: needs release-history knowledge to do safely; out of sweep scope.
  • #235 — audit CLAUDE.md against config files: open-ended audit, BLOCKED.
  • #232 — standardize dashboard description format across README/CLAUDE.md: README has pre-existing duplicate sections that need a separate cleanup PR.
  • #291, #295, #388, #313, #395 — pin actions/Docker base images to SHAs: too many separate refs needing trustworthy SHA lookups for a sweep PR; BLOCKED.
  • #394 — make GHCR image public: requires GitHub org settings change, not a code change; BLOCKED.
  • #339 — purge old htpasswd hash from git history: explicit git filter-repo / force-push; safety-policy BLOCKED.
  • #307 — update pixi-pip-audit-severity-filter skill knowledge base: out-of-repo content; BLOCKED.
  • #299 — pip-audit task UX (no pip packages): doc-only and ambiguous; left for a focused PR.
  • #363 — set due_on on v0.2.0 milestone: requires team agreement on ship date; not a code change.
  • #366 — close duplicate audit findings: meta/governance, requires per-issue triage.
  • #426 — rename nats_jetstream_* etc. with unit suffixes: would break any external consumers relying on current names; BLOCKED on a deprecation plan.
  • #434 — propagate security-contact fix to sibling repos: cross-repo, out of sweep scope.

mvillmow added 18 commits May 11, 2026 19:40
…exporter_fetch_errors

The metric is a per-scrape gauge (resets each collection cycle), not a
monotonic counter. The _total suffix was misleading tooling that
auto-detects type from the suffix. Renames the metric, adds an assertion
that the old name no longer appears, and refreshes the CLAUDE.md
metric reference (also fixes the stale nats_*_total entries documented
under cumulative counters — those are emitted as gauges from /varz).

Closes #430
Adds a header comment block explaining what `set dotenv-load` does, which
recipes depend on the sourced env, and which vars must be set in .env
(GF_ADMIN_PASSWORD).

Closes #265
Closes #319
Single coordinated CLAUDE.md update covering several follow-ups:

- Add Alertmanager to the Stack Components table; add
  reload-alertmanager and test-alertmanager to Common Commands.
- Add a 'Network topology' section explaining the two-network
  (argus + loki-internal) design and warning against re-attaching loki
  to argus.
- Document CONTAINER_CMD, PROMTAIL_HOST_LABEL in the env-var table.
- Note that the NATS_URL gateway IP (172.24.0.1) intentionally differs
  from the Agamemnon/Nestor gateway (172.20.0.1).
- Point at AGENTS.md from the Common Commands section so new
  contributors discover the multi-agent coordination protocol.

Closes #175
Closes #216
Closes #223
Closes #252
Closes #258
Closes #336
Closes #356
Closes #382
Builds a single reference table for every metric the exporter emits
(name, labels, description). Cross-linked from CLAUDE.md so future
reviewers see exactly what the exporter exposes without grepping
exporter.py.

Closes #423
Closes #427
…cation

Surfaces the easily-missed operator preconditions and runtime gotchas
that several follow-up issues asked to be documented:

- Onboarding step: copy .env.example to .env (#181, #196).
- /tmp/hermes.log must exist on the host before `just start` (#192).
- htpasswd is auto-generated by `just start`; rotation steps documented (#228, #342).
- All host ports are loopback-only; SSH/Tailscale tunneling is the
  supported remote-access pattern (#188, #199, #245).
- `just test-scrape` runs inside the container; debug-prometheus and
  debug-loki are the entry points for ad-hoc inspection (#199).
- Backup/restore expectations on cold hosts (#360).
- jq unavailability on win-64 (#405).
- NATS_URL gateway IP vs. Prometheus localhost target (#386).

Closes #181
Closes #188
Closes #192
Closes #196
Closes #199
Closes #228
Closes #245
Closes #342
Closes #360
Closes #386
Closes #405
… remaining jobs

Brings the remaining unhardened jobs up to the same baseline as
unit-tests (which got timeout-minutes + permissions in #113):

- integration-tests, security-dependency-scan, security-secrets-scan,
  config-validate, schema-validation, deps-version-sync, atlas-dashboard
  all now declare an explicit timeout-minutes and 'permissions:
  contents: read' (read-only is sufficient — none of these jobs need
  write access to the repo, packages, or actions).

Closes #275
…umps

pytest-cov 7.x and earlier ship coverage.py 6.x/7.x respectively, which
have changed default branch-coverage semantics across major versions.
Capping below 8.0 keeps `pixi update` from silently swapping in a
breaking version. Lower bound stays at 5.0.

Closes #280
…omment

Python 3.13 reached GA in October 2024, so the (3, 12) cap is now
unnecessarily conservative. Documents the upgrade process in a
comment so future maintainers know advancing the ceiling is an
intentional review gate, not a magic number.

Closes #312
tomllib was added to the stdlib in Python 3.11. The script previously
hard-imported it, which would crash with ModuleNotFoundError under any
pre-commit environment provisioned with an older interpreter. Falls back
to the upstream tomli package so the script can run under Python 3.10
when tomli is on the path.

Closes #402
…rics sections

Two of the optional env-var rows had cell content too long to align in
the existing pipe table, tripping MD060/table-column-style. Move them
out of the table into a follow-up bullet list. Wrap the long
nats_*_bytes line under 120 cols.

Cleans up so the new markdownlint pre-commit hook can land green.
Mirrors the yamllint hook pattern: enforces .markdownlint.yaml (already
in the repo) on every commit so Markdown quality follows the same
guard-rail as YAML and Python. Pinned to v0.13.0.

Closes #379
…theus.yml comment

- Ruff's auto-fix flips `not X is None` to `X is not None` (SIM103);
  the existing noqa was vestigial.
- yamllint flagged the new Promtail comment block in prometheus.yml as
  comments-indentation; dedent to column 0 since it's a top-level note
  about the scrape_configs list as a whole, not a comment on a list item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment