Skip to content

feat: add edgenode-harvester container variant#137

Open
evangineer wants to merge 18 commits intomainfrom
feat/edgenode-harvester
Open

feat: add edgenode-harvester container variant#137
evangineer wants to merge 18 commits intomainfrom
feat/edgenode-harvester

Conversation

@evangineer
Copy link
Copy Markdown
Contributor

@evangineer evangineer commented Mar 19, 2026

Summary

  • Introduces Dockerfile.harvester — a new edge node container variant that bundles log-harvester (unytco/log-harvester) instead of log-sender, for Unyt invoice aggregation
  • unyt.happ is baked in from the latest unytco/unyt-sandbox release (pinnable via UNYT_HAPP_VERSION build arg)
  • Self-contained s6-overlay-harvester/ service tree manages conductor, log-harvester, logrotate-cron, and setup
  • Dockerfile.log-collector added — our Dockerfile for the unytco/log-collector Cloudflare Worker, used in CI and local integration testing
  • CI job build-and-push-harvester-image publishes ghcr.io/holo-host/edgenode-harvester on release; requires HARVESTER_REPO_TOKEN secret (PAT with repo read access to unytco/log-harvester and unytco/log-collector)
  • docker-compose.yml updated with edgenode-harvester service and restart: unless-stopped on log-collector
  • New LOG_HARVESTER_QUICKSTART.md and updated README.md, docker/README.md, docker/CHANGELOG.md, docker/TESTING.md

Architecture

log-collector (Cloudflare Worker)
  → edgenode-harvester (this container)
      ├── Holochain conductor (admin: 4444, app: 4445)
      ├── unyt.happ (installed at first startup)
      └── log-harvester (Node.js, daily loop)
            → Unyt Agreements (parked spend invoices)

Notable fixes

  • Config path moved from /etc/log-harvester to /data/log-harvester so it survives restarts on the volume-mounted path; drops the now-redundant holo-config-harvester volume
  • s6 shebangs fixed to #!/command/with-contenv so HC_* env vars are visible inside services
  • log-harvester source is pre-cloned via actions/checkout in CI (and git clone locally by run_harvester_tests.sh) then COPY'd into the image — no build-time secrets needed in the Dockerfile
  • GHA layer caching added to PR checks for edgenode, edgenode-harvester, and log-collector builds
  • Test runner isolationrun_tests_multi.sh runs each .bats file individually with a log-collector health-check wait between files, so a wrangler dev crash in one file doesn't cascade to subsequent files

Test suite

Four BATS files run via ./run_harvester_tests.sh (builds image, starts all services, waits for readiness):

File What it tests
harvester_startup.bats Conductor ready, unyt.happ installed, config initialized, service started
harvester_process.bats Holochain and Node.js run as nonroot
harvester_integration.bats Connectivity — harvester reaches log-collector, /logs returns success
harvester_e2e.bats Full pipeline — log-sender submits signed metrics → D1 → harvester fetches and invoices (--today --dry-run), asserts "Successfully invoiced logs."

Test plan

  • Built locally — clean build
  • 11/11 harvester tests pass (run_harvester_tests.sh)
  • All CI checks pass (test-docker-image and test-harvester-image)
  • Confirm HARVESTER_REPO_TOKEN secret is set in repo Actions secrets before merging (required for CI build)
  • Run docker compose up edgenode-harvester with valid COLLECTOR_URL, ADMIN_SECRET, LAIR_PASSWORD and verify harvester reaches "Starting log-harvester service..." in startup log
  • Verify lastInvoice is preserved across a container restart

🤖 Generated with Claude Code

Introduces a new container variant (Dockerfile.harvester) that bundles
log-harvester (unytco/log-harvester) instead of log-sender. The harvester
reads from log-collector, aggregates usage data, and parks invoices on
Unyt Agreements via an embedded Holochain conductor.

Key changes:
- Dockerfile.harvester: single-stage wolfi-base build, clones and builds
  log-harvester from source via GITHUB_TOKEN build secret, bakes in
  unyt.happ from latest unytco/unyt-sandbox release (pinnable via
  UNYT_HAPP_VERSION build arg), exposes ports 4444 and 4445
- s6-overlay-harvester/: self-contained s6 service tree (conductor,
  log-harvester, logrotate-cron, setup) with no dependency on the base
  s6-overlay directory
- log-harvester s6 service: waits for conductor readiness, installs
  unyt.happ, attaches app websocket on 4445, init/refreshes harvester
  config, runs harvester in loop mode
- CI: build-and-push-harvester-image job publishing
  ghcr.io/holo-host/edgenode-harvester using HARVESTER_REPO_TOKEN secret
- docker-compose.yml: edgenode-harvester service for local testing
- Docs: LOG_HARVESTER_QUICKSTART.md, updated README.md, docker/README.md,
  docker/CHANGELOG.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@evangineer evangineer self-assigned this Mar 19, 2026
@evangineer evangineer marked this pull request as draft March 19, 2026 12:34
evangineer and others added 4 commits March 19, 2026 14:11
Adds BATS test files and a test runner for the harvester container variant:

- harvester_startup.bats: verifies conductor start, unyt.happ install,
  app-ws on 4445, config initialization, and log-harvester service start
- harvester_process.bats: verifies holochain and node run as nonroot
- run_harvester_tests.sh: runner analogous to run_tests_multi.sh, builds
  the harvester image with GITHUB_TOKEN secret and waits for full startup
- pr-checks.yml: adds test-harvester-image CI job running on docker/** PRs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds harvester_e2e.bats which submits real signed metrics via log-sender,
verifies they reach D1, then runs the harvester (--today --dry-run) and
asserts "fetched metrics count" > 0 and "Successfully invoiced logs."

Trims harvester_integration.bats to two lightweight connectivity checks.
Extends run_harvester_tests.sh to start edgenode and wait for its
Holochain conductor before running the e2e test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CONFIG_PATH: /etc/log-harvester → /data/log-harvester (volume-mounted,
  survives restarts; drops now-redundant holo-config-harvester volume)
- s6 run scripts: shebang → #!/command/with-contenv so HC_* env vars
  are available inside the service
- log-harvester/run: remove add-app-ws call (handled by harvester init);
  add chown after both init and refresh paths
- Dockerfile.harvester: GITHUB_TOKEN secret now optional — falls back to
  unauthenticated clone when git credentials already grant access
- pr-checks.yml: switch to docker/build-push-action with GHA layer cache
  for edgenode, edgenode-harvester, and log-collector builds
- run_tests_multi.sh: build log-collector separately before compose up
  to preserve layer cache across repeat runs
- harvester_startup.bats: update config path and websocket test to match

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@evangineer evangineer requested a review from zippy March 19, 2026 19:29
evangineer and others added 11 commits March 19, 2026 19:35
The docker/log-collector directory is gitignored (it's a separate repo).
Add an actions/checkout step to clone it into place before the build.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The log-collector Dockerfile and entrypoint are ours, not upstream's.
Store as Dockerfile.log-collector in the docker/ directory so CI can
find them after checking out unytco/log-collector as the build context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
secret-envs is not a valid build-push-action parameter; the correct
field is secrets which mounts the value directly as a build secret.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
build-push-action secrets: does not reliably mount the github_token
secret into the BuildKit RUN --mount. Revert to an explicit
docker buildx build run: step with --secret id=github_token,env=GITHUB_TOKEN,
which is the known-working approach. GHA layer cache flags are retained.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rets

Docker BuildKit secret mounting is unreliable in GHA. Mirror the
log-collector approach: checkout unytco/log-harvester into
docker/log-harvester-src/ before the build, then COPY it in.

- Dockerfile.harvester: replace RUN --mount=type=secret git clone
  with COPY log-harvester-src
- pr-checks.yml: add actions/checkout step for log-harvester
- run_harvester_tests.sh: auto-clone log-harvester-src if absent
- .gitignore: add docker/log-harvester-src/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
docker compose up -d was starting all services including edgenode-harvester,
which requires log-harvester-src in the build context. Explicitly name only
the services needed for the edgenode test suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In CI the wrangler state directory is empty (no persistent local volume),
so drone_registrations and other tables don't exist. Run wrangler d1 execute
--local --file=schema.sql in the entrypoint before starting wrangler dev.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wrangler dev reads vars from wrangler.toml, not Docker env vars.
Pass --var ADMIN_SECRET:${ADMIN_SECRET} so the worker picks up the
secret configured in docker-compose.yml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
run_tests_multi.sh was running all *.bats files including harvester_*,
which fail when edgenode-harvester isn't started. Exclude harvester files
explicitly — they belong to run_harvester_tests.sh.

Also add setup() skip guards to harvester_startup.bats and
harvester_process.bats which were missing them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add restart: unless-stopped to log-collector service so Docker
  auto-recovers when wrangler dev crashes under concurrent D1 load
- Tighten healthcheck (15s interval, 5 retries, 30s start_period)
- Run each .bats file separately in run_tests_multi.sh with a
  log-collector health-check wait between files, so a wrangler crash
  in integration_data_pipeline.bats doesn't cascade to later files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@evangineer evangineer marked this pull request as ready for review March 19, 2026 21:32
evangineer and others added 2 commits March 19, 2026 22:49
- CHANGELOG: add all Unreleased entries for harvester variant work
  (Dockerfile.log-collector, run_harvester_tests.sh, BATS files,
  test runner isolation, restart policy)
- TESTING: note per-file test isolation and harvester_* naming convention
- LOG_HARVESTER_QUICKSTART: remove stale --secret flag from local build
  snippet; replace with git clone of log-harvester-src (required for COPY)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dockerfile was renamed/consolidated; latest-unyt tag is not published.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant