feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup by matze4u · Pull Request #6 · event4u-app/agent-memory

matze4u · 2026-04-27T00:41:15Z

Summary

Closes Phase 1–3 of agents/roadmaps/team-memory-deployment.md — the roadmap that turns the per-developer local memory into a single shared brain for the Galawork team, without any package code change. All decisions are recorded as ADRs; deployment artefacts and consumer docs land here so Phase 2 (spike) can start on top of a reviewed baseline.

Phase 1 — Decisions (ADRs `Accepted`)

ADR-0004 — Hosting: Hetzner Cloud CX22 (EU-Falkenstein) + self-managed Postgres+pgvector in Compose; nightly pg_dump to a Hetzner Storage Box. Total ≈ €8.82/month (well under the €25 ceiling).
ADR-0005 — Auth: Tailscale tailnet as the network gate, layered with the existing MEMORY_MCP_AUTH_TOKEN bearer. Defense in depth; SSO offboarding via Tailscale group.
ADR-0006 — Scope policy: Team-brain default — every consumer .agent-memory.yml omits repository:. Per-entry scope.repository provenance preserved. Any developer may run memory promote; the existing trust pipeline is the V1 quality gate.

Phase 2 — Deploy scaffolding

deploy/team-memory/docker-compose.yml — pgvector/pgvector:pg17 + agent-memory in SSE mode, port bound to TAILNET_IP only (never 0.0.0.0).
deploy/team-memory/.env.example — required POSTGRES_PASSWORD, MEMORY_MCP_AUTH_TOKEN, TAILNET_IP.
deploy/team-memory/README.md — end-to-end runbook (provision → Tailscale → deploy → verify → backups → restore drill → consumer onboarding pointer).
agents/analysis/team-memory-spike-notes.md — spike log template (acceptance checks, daily entries, cost + latency reality checks, restore-drill preview, sign-off decision).

Phase 3 — Consumer-side documentation + onboarding

docs/consumer-setup-docker-sidecar.md §4 — full team-memory remote-mode reference (1Password-backed bearer fetch, SSE MCP client config, /sse curl health probe, team-brain .agent-memory.yml, troubleshooting rows).
docs/consumer-setup-generic.md — Pattern C: MCP over SSE (shared brain).
docs/consumer-setup-node.md §4 — SSE alternative for the team-brain case.
scripts/team-memory-onboard.sh — read-only developer helper that checks Tailscale, brain DNS, 1Password bearer fetch, /sse handshake; prints copy-pasteable shell exports.

Out of scope (tracked, not in this PR)

Phase 3 Steps 3–4 — per-consumer-repo rollout (each repo updates its own .agent-memory.yml in its own PR) and CI memory doctor integration.
Phase 4 — migration of existing local DBs (deferred until after the spike).
Phase 5 — operations (backups, monitoring, capacity, offboarding runbook). Restore drill is the gate to declare Phase 5 done.

Constraint compliance

✅ No src/ changes — every artefact composes from existing CLI/MCP primitives (memory mcp --transport sse, repository-filter omission, promotion gate).
✅ Cost ≤ €25/month: estimated €8.82 (CX22 + BX11 storage box).
✅ All cross-document links validated by npm run check:links (218 successful, 0 errors).
✅ docs/secret-safety.md extended with the policy floor for shared deployments (PII, customer data, production logs, opinions — never permitted regardless of pattern catalog).

Verification

npm run check:links     # ✓ 218 successful, 0 errors
bash -n scripts/team-memory-onboard.sh && scripts/team-memory-onboard.sh --help

The onboarding script's failure paths exercised on a non-tailnet host produce the expected red checks and a non-zero exit.

Reviewer focus

ADR-0004/0005/0006 — are the decisions tight enough that Phase 2 spike can proceed?
deploy/team-memory/docker-compose.yml — is the TAILNET_IP:7078:7078 port binding enforcement sufficient, or do we need a host-level firewall layer?
docs/consumer-setup-docker-sidecar.md §4 — does the bearer-fetch flow match how the team actually distributes secrets?

Co-authored by Augment Code

Adds the team-wide agent-memory deployment roadmap (5 phases) plus three Status: Proposed ADRs that scope the Phase-1 decisions: - ADR-0004: hosting (Hetzner CX22 / AWS RDS / Fly.io / existing Galawork) - ADR-0005: auth model (Tailscale / Cloudflare Tunnel+mTLS / public+token) - ADR-0006: scope default + promotion authority No infra change, no package patch, no decisions made yet — the ADRs lay out the option matrices so reviewers can weigh in before Phase 2 spike.

ADR-0004: Hetzner CX22 + Storage Box (~€8.82/mo, self-managed pg_dump) ADR-0005: Tailscale + existing MCP_AUTH_TOKEN bearer (defense in depth) ADR-0006: team-brain default + any-dev promotion (trust pipeline = gate) Also adds the policy floor section to docs/secret-safety.md covering what is never allowed in shared memory beyond the technical pattern catalog (PII, production data, personal opinions). Closes Phase 1 Steps 1\u20135 of agents/roadmaps/team-memory-deployment.md.

…plate deploy/team-memory/ holds the maintainer-side artefacts for the shared brain: - docker-compose.yml: pgvector/pgvector:pg17 + agent-memory in SSE mode, SSE port bound to TAILNET_IP only; postgres internal-only - .env.example: required POSTGRES_PASSWORD, MEMORY_MCP_AUTH_TOKEN, TAILNET_IP - README.md: end-to-end runbook (provision -> Tailscale -> deploy -> verify -> backups -> restore drill -> consumer onboarding pointer) agents/analysis/team-memory-spike-notes.md is the Phase-2 spike log template (acceptance checks, daily entries, cost + latency reality checks, restore-drill preview, sign-off decision). No package code changed; everything composes from existing CLI/MCP primitives per the team-memory roadmap constraints.

Adds the team-memory remote-mode path to all three consumer-setup docs: - docker-sidecar.md §4: full section (bearer fetch via 1Password, SSE MCP client config, /sse curl health probe, team-brain .agent-memory.yml shape, troubleshooting rows) - generic.md: Pattern C — MCP over SSE, with pointer to the runbook - node.md §4: SSE alternative for the team-brain case All three docs now describe the team-brain default (no repository: filter, per ADR-0006) and the Tailscale + bearer auth model (ADR-0005). Roadmap Phase 3 Step 1 ticked.

scripts/team-memory-onboard.sh runs four read-only checks: 1. Tailscale CLI installed and tailnet up 2. Brain hostname resolves over the tailnet 3. MCP bearer fetched from 1Password (op://Engineering/team-memory/mcp-bearer) 4. SSE handshake on /sse returns the endpoint header Never edits shell rc files, never writes secrets to disk. Prints copy-pasteable export commands on success. Failure exits non-zero with actionable hints. Defaults overridable via MEMORY_BRAIN_HOST, MEMORY_BRAIN_PORT, MEMORY_BEARER_OP_REF env vars. Roadmap Phase 3 Step 2 ticked. Steps 3-4 remain open and are tracked per consumer repo (out of scope for this repo's PR).

Validated deploy/team-memory/docker-compose.yml on a workstation before any Hetzner spend: stack composes cleanly, both containers reach healthy, all four SSE auth boundaries (200/401/403/404) match docs/mcp-http.md, and the data-plane round-trip (propose -> promote -> verify) works end-to-end. Findings logged in agents/analysis/team-memory-dryrun-results.md: - GHCR :latest is not yet published; runbook §5 will fail until a sha-tag lands (papercut, no blocker). - Synthetic entries without --file/--scenario depress below the trust threshold by design — the smoke test now seeds realistic entries. scripts/team-memory-smoketest.sh codifies the four acceptance checks from agents/analysis/team-memory-spike-notes.md so the operator can run them deterministically during the spike. Spike-notes header references the dry-run outcome. Roadmap dashboard regenerated per roadmap-progress-sync.

Three setup pieces that live outside docker-compose.yml and that the spike operator needs in addition to the Compose stack: - deploy/team-memory/tailscale-acl.json — pasteable Tailscale ACL implementing ADR-0005, with tagOwners, two groups, default-deny ACL for tag:memory-host:7078, admin SSH on :22, and regression tests. - deploy/team-memory/operator-setup.md — Hetzner Cloud Firewall recipe (Console + hcloud CLI), Tailscale ACL pointer, and 1Password vault item schema with Bitwarden / Vault / Doppler equivalents. - deploy/team-memory/README.md §1/§2/§3 link to the new artefacts so the runbook stays the one entry point during the spike.

…ty.md Phase 1 commit 485bb9a stripped two trailing spaces from the auto-generated catalog header lines; CI II1 (Secret-pattern doc drift) flagged it. Re-ran npm run docs:secrets to put them back. Catalog content unchanged (still v1.0.0, 27 patterns, 17 providers).

…ity step Two issues surfaced by the dry-run: ':latest' didn't exist (workflow line 69 reserves it for v* git tags, no release shipped yet) and the package was private (HTTP 401 on anonymous pulls). Operator decision: keep ':latest' convention intact, default to ':main' for the spike, set the package public. - .env.example: MEMORY_IMAGE_TAG=main (was: latest), with comment block pointing at sha-tags for production reproducibility and v* tags as the future ':latest' source. - docker-compose.yml: fallback ${MEMORY_IMAGE_TAG:-main} (was: -latest). - operator-setup.md §4 (new): one-time GitHub UI step to flip the GHCR package to public, plus the available-tag matrix and revert path. - README.md §5: one-time prerequisite note pointing at operator-setup §4. - team-memory-dryrun-results.md Finding 1: marked RESOLVED with the investigation table and the chosen path. - team-memory-spike-notes.md header: dry-run pointer reflects the resolution, not the original papercut. No workflow change. Verified: docker compose config now resolves to ghcr.io/event4u-app/agent-memory:main; npm run check:links clean (218 ✓).

CLI `memory propose` cannot attach evidence — only ingestion scanners and `mcp.memory_ingest` from an agent context can. So the previous smoketest hit two trust-pipeline floors: 1. `--impact normal` requires MIN_EVIDENCE_COUNT=1 → rejected at promotion with `evidence_floor`. 2. Even after promotion, zero-evidence entries floor at trust 0.2 (src/trust/scoring.ts:27), below both default (0.6) and low-trust (0.3) retrieval thresholds. Fix: smoketest now uses --impact low (MIN_EVIDENCE_COUNT=0), verifies the entry via `memory verify`, and asserts retrieve indexes it as a candidate (totalCandidates >= 1) instead of expecting a surfaced result (filtering CLI-only entries is correct behaviour). Adds a `memory health` probe as a third check. Re-validated against the local Compose stack: 8/8 ✓. Updates dryrun-results.md Finding 2 with the technically-accurate explanation (CLI cannot attach evidence at all — \`--file\`/\`--scenario\` populate scope, not evidence) and clarifies that real Phase-2 spike Step-4 round-trip must drive through mcp.memory_ingest from an agent. Co-authored-by: Mathias Berg <noreply@example.com>

One-page sequence for the Phase 2 spike — short commands per step, links to the long-form runbook (README.md) and the manual setup pieces (operator-setup.md). Covers Day 0 prep work (no spend) + Day 1 provisioning + 'if something goes wrong' table. README §preamble adds a pointer to the cheat-sheet so the entry point to the runbook is the operator's choice (verbose vs scannable).

matze4u added 5 commits April 27, 2026 02:40

matze4u changed the title ~~docs(team-memory): add deployment roadmap and Phase-1 ADR stubs~~ feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup Apr 27, 2026

matze4u and others added 6 commits April 27, 2026 04:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup#6

feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup#6
matze4u wants to merge 11 commits into
mainfrom
docs/team-memory-roadmap

matze4u commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matze4u commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Phase 1 — Decisions (ADRs Accepted)

Phase 2 — Deploy scaffolding

Phase 3 — Consumer-side documentation + onboarding

Out of scope (tracked, not in this PR)

Constraint compliance

Verification

Reviewer focus

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

matze4u commented Apr 27, 2026 •

edited

Loading

Phase 1 — Decisions (ADRs `Accepted`)