chore: easy-issue sweep 2026-05-11#502
Merged
Merged
Conversation
…exporter_fetch_errors The metric is a per-scrape gauge (resets each collection cycle), not a monotonic counter. The _total suffix was misleading tooling that auto-detects type from the suffix. Renames the metric, adds an assertion that the old name no longer appears, and refreshes the CLAUDE.md metric reference (also fixes the stale nats_*_total entries documented under cumulative counters — those are emitted as gauges from /varz). Closes #430
Single coordinated CLAUDE.md update covering several follow-ups: - Add Alertmanager to the Stack Components table; add reload-alertmanager and test-alertmanager to Common Commands. - Add a 'Network topology' section explaining the two-network (argus + loki-internal) design and warning against re-attaching loki to argus. - Document CONTAINER_CMD, PROMTAIL_HOST_LABEL in the env-var table. - Note that the NATS_URL gateway IP (172.24.0.1) intentionally differs from the Agamemnon/Nestor gateway (172.20.0.1). - Point at AGENTS.md from the Common Commands section so new contributors discover the multi-agent coordination protocol. Closes #175 Closes #216 Closes #223 Closes #252 Closes #258 Closes #336 Closes #356 Closes #382
…cation Surfaces the easily-missed operator preconditions and runtime gotchas that several follow-up issues asked to be documented: - Onboarding step: copy .env.example to .env (#181, #196). - /tmp/hermes.log must exist on the host before `just start` (#192). - htpasswd is auto-generated by `just start`; rotation steps documented (#228, #342). - All host ports are loopback-only; SSH/Tailscale tunneling is the supported remote-access pattern (#188, #199, #245). - `just test-scrape` runs inside the container; debug-prometheus and debug-loki are the entry points for ad-hoc inspection (#199). - Backup/restore expectations on cold hosts (#360). - jq unavailability on win-64 (#405). - NATS_URL gateway IP vs. Prometheus localhost target (#386). Closes #181 Closes #188 Closes #192 Closes #196 Closes #199 Closes #228 Closes #245 Closes #342 Closes #360 Closes #386 Closes #405
… remaining jobs Brings the remaining unhardened jobs up to the same baseline as unit-tests (which got timeout-minutes + permissions in #113): - integration-tests, security-dependency-scan, security-secrets-scan, config-validate, schema-validation, deps-version-sync, atlas-dashboard all now declare an explicit timeout-minutes and 'permissions: contents: read' (read-only is sufficient — none of these jobs need write access to the repo, packages, or actions). Closes #275
…umps pytest-cov 7.x and earlier ship coverage.py 6.x/7.x respectively, which have changed default branch-coverage semantics across major versions. Capping below 8.0 keeps `pixi update` from silently swapping in a breaking version. Lower bound stays at 5.0. Closes #280
…omment Python 3.13 reached GA in October 2024, so the (3, 12) cap is now unnecessarily conservative. Documents the upgrade process in a comment so future maintainers know advancing the ceiling is an intentional review gate, not a magic number. Closes #312
tomllib was added to the stdlib in Python 3.11. The script previously hard-imported it, which would crash with ModuleNotFoundError under any pre-commit environment provisioned with an older interpreter. Falls back to the upstream tomli package so the script can run under Python 3.10 when tomli is on the path. Closes #402
…rics sections Two of the optional env-var rows had cell content too long to align in the existing pipe table, tripping MD060/table-column-style. Move them out of the table into a follow-up bullet list. Wrap the long nats_*_bytes line under 120 cols. Cleans up so the new markdownlint pre-commit hook can land green.
Mirrors the yamllint hook pattern: enforces .markdownlint.yaml (already in the repo) on every commit so Markdown quality follows the same guard-rail as YAML and Python. Pinned to v0.13.0. Closes #379
…theus.yml comment - Ruff's auto-fix flips `not X is None` to `X is not None` (SIM103); the existing noqa was vestigial. - yamllint flagged the new Promtail comment block in prometheus.yml as comments-indentation; dedent to column 0 since it's a top-level note about the scrape_configs list as a whole, not a comment on a list item.
This was referenced May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bundled easy-issue sweep — see https://github.com/HomericIntelligence/Odysseus coordination thread.
Implemented
.env.example→.env) added to CLAUDE.md Operator Notes.just debug-prometheus./tmp/hermes.logpre-create requirement..env.examplealready requiresGF_ADMIN_PASSWORD; CLAUDE.md Operator Notes calls out the silent-fallback failure mode.just debug-prometheus/just debug-loki/just test-scrapepost-port-removal behaviour.NATS_URLgateway IP172.24.0.1differs from the172.20.0.1used by Agamemnon/Nestor.just gen-htpasswd(now auto-run byjust start) and rotation steps..github/PULL_REQUEST_TEMPLATE.mdvalidation checklist gains aCHANGELOG.mdstep..env.exampledocumentsPROMTAIL_HOST_LABELoverride.argus+loki-internaltwo-network design.configs/prometheus.ymldocuments the deliberate Promtail scrape omission..env.exampleGF_ADMIN_PASSWORDcomment now covers thejust import-dashboardsrequirement.justfileheader comment documentsset dotenv-loadand required env vars..github/workflows/_required.ymladdstimeout-minutes:andpermissions: contents: readto the seven previously-unhardened jobs.pixi.tomlcapspytest-covat<8to avoid silent breakage on major bumps.CONTRIBUTING.mdgains a "Dependency Updates (Renovate)" section noting the App-install requirement.tests/test_dockerfile_constraints.pywidens approved Python ceiling to 3.13 with policy comment.justfileheader as Document set dotenv-load behavior in justfile header comment #265.tests/test_configs.pyALLOWED_BINDINGSconstants gain a rationale + exception-process comment.CONTAINER_CMD..pre-commit-config.yamladdsmarkdownlint-cli2(verified clean against.markdownlint.yaml).AGENTS.md.NATS_URLvslocalhost:8222split.scripts/check-version-consistency.pyfalls back totomliwhen stdlibtomllibis unavailable.jqwin-64 limitation.docs/metrics.mdis the new canonical metric catalog.docs/metrics.md.homeric_exporter_fetch_errors_totalrenamed tohomeric_exporter_fetch_errors(gauge, not counter).ALREADY-DONE (verified, will be closed manually after merge)
The following issues were verified ALREADY-DONE on
cd73468and could not be auto-closed (token lacksissues:write); they will be closed manually after this PR merges:#149—.github/workflows/publish-exporter-image.ymlalready publishes to GHCR.#146—.claude/settings.jsonalready includes deny-list guardrails (rm -rf, force-push, etc.).#170—.github/workflows/ci.ymlalready validates everydashboards/*.jsonfile via the wildcard loop, includingjetstream-events.json.#175— Alertmanager rows added to CLAUDE.md stack table and Common Commands.#176—pixi.tomlalready declarespythonand thetest/test-unittasks.#208—docker-compose.ymlalready pins every image to a version tag (no:latest).#238—v0.1.0andv0.2.0tags already exist in this repo.#247—prom/alertmanager:v0.32.1is already pinned indocker-compose.yml.#374—softprops/action-gh-releaseis no longer used in.github/workflows/release.yml(replaced by built-ingh release create).#375—CHANGELOG.mdalready has a[0.2.0] - 2026-05-06section.#409—justfileheader comment now documentsGF_ADMIN_PASSWORDrequirement; CLAUDE.md env-vars table also covers it.Skipped / BLOCKED
#240— split Atlas M1–M3 CHANGELOG entry: needs release-history knowledge to do safely; out of sweep scope.#235— audit CLAUDE.md against config files: open-ended audit, BLOCKED.#232— standardize dashboard description format across README/CLAUDE.md: README has pre-existing duplicate sections that need a separate cleanup PR.#291,#295,#388,#313,#395— pin actions/Docker base images to SHAs: too many separate refs needing trustworthy SHA lookups for a sweep PR; BLOCKED.#394— make GHCR image public: requires GitHub org settings change, not a code change; BLOCKED.#339— purge old htpasswd hash from git history: explicitgit filter-repo/ force-push; safety-policy BLOCKED.#307— updatepixi-pip-audit-severity-filterskill knowledge base: out-of-repo content; BLOCKED.#299— pip-audit task UX (no pip packages): doc-only and ambiguous; left for a focused PR.#363— setdue_onon v0.2.0 milestone: requires team agreement on ship date; not a code change.#366— close duplicate audit findings: meta/governance, requires per-issue triage.#426— renamenats_jetstream_*etc. with unit suffixes: would break any external consumers relying on current names; BLOCKED on a deprecation plan.#434— propagate security-contact fix to sibling repos: cross-repo, out of sweep scope.