feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup#6
Open
matze4u wants to merge 11 commits into
Open
feat(team-memory): close Phase 1-3 — decisions, deploy scaffolding, consumer setup#6matze4u wants to merge 11 commits into
matze4u wants to merge 11 commits into
Conversation
Adds the team-wide agent-memory deployment roadmap (5 phases) plus three Status: Proposed ADRs that scope the Phase-1 decisions: - ADR-0004: hosting (Hetzner CX22 / AWS RDS / Fly.io / existing Galawork) - ADR-0005: auth model (Tailscale / Cloudflare Tunnel+mTLS / public+token) - ADR-0006: scope default + promotion authority No infra change, no package patch, no decisions made yet — the ADRs lay out the option matrices so reviewers can weigh in before Phase 2 spike.
ADR-0004: Hetzner CX22 + Storage Box (~€8.82/mo, self-managed pg_dump) ADR-0005: Tailscale + existing MCP_AUTH_TOKEN bearer (defense in depth) ADR-0006: team-brain default + any-dev promotion (trust pipeline = gate) Also adds the policy floor section to docs/secret-safety.md covering what is never allowed in shared memory beyond the technical pattern catalog (PII, production data, personal opinions). Closes Phase 1 Steps 1\u20135 of agents/roadmaps/team-memory-deployment.md.
…plate
deploy/team-memory/ holds the maintainer-side artefacts for the shared brain:
- docker-compose.yml: pgvector/pgvector:pg17 + agent-memory in SSE mode,
SSE port bound to TAILNET_IP only; postgres internal-only
- .env.example: required POSTGRES_PASSWORD, MEMORY_MCP_AUTH_TOKEN, TAILNET_IP
- README.md: end-to-end runbook (provision -> Tailscale -> deploy ->
verify -> backups -> restore drill -> consumer onboarding pointer)
agents/analysis/team-memory-spike-notes.md is the Phase-2 spike log
template (acceptance checks, daily entries, cost + latency reality
checks, restore-drill preview, sign-off decision).
No package code changed; everything composes from existing CLI/MCP
primitives per the team-memory roadmap constraints.
Adds the team-memory remote-mode path to all three consumer-setup docs:
- docker-sidecar.md §4: full section (bearer fetch via 1Password,
SSE MCP client config, /sse curl health probe, team-brain
.agent-memory.yml shape, troubleshooting rows)
- generic.md: Pattern C — MCP over SSE, with pointer to the runbook
- node.md §4: SSE alternative for the team-brain case
All three docs now describe the team-brain default (no repository:
filter, per ADR-0006) and the Tailscale + bearer auth model
(ADR-0005). Roadmap Phase 3 Step 1 ticked.
scripts/team-memory-onboard.sh runs four read-only checks: 1. Tailscale CLI installed and tailnet up 2. Brain hostname resolves over the tailnet 3. MCP bearer fetched from 1Password (op://Engineering/team-memory/mcp-bearer) 4. SSE handshake on /sse returns the endpoint header Never edits shell rc files, never writes secrets to disk. Prints copy-pasteable export commands on success. Failure exits non-zero with actionable hints. Defaults overridable via MEMORY_BRAIN_HOST, MEMORY_BRAIN_PORT, MEMORY_BEARER_OP_REF env vars. Roadmap Phase 3 Step 2 ticked. Steps 3-4 remain open and are tracked per consumer repo (out of scope for this repo's PR).
Validated deploy/team-memory/docker-compose.yml on a workstation before any Hetzner spend: stack composes cleanly, both containers reach healthy, all four SSE auth boundaries (200/401/403/404) match docs/mcp-http.md, and the data-plane round-trip (propose -> promote -> verify) works end-to-end. Findings logged in agents/analysis/team-memory-dryrun-results.md: - GHCR :latest is not yet published; runbook §5 will fail until a sha-tag lands (papercut, no blocker). - Synthetic entries without --file/--scenario depress below the trust threshold by design — the smoke test now seeds realistic entries. scripts/team-memory-smoketest.sh codifies the four acceptance checks from agents/analysis/team-memory-spike-notes.md so the operator can run them deterministically during the spike. Spike-notes header references the dry-run outcome. Roadmap dashboard regenerated per roadmap-progress-sync.
Three setup pieces that live outside docker-compose.yml and that the spike operator needs in addition to the Compose stack: - deploy/team-memory/tailscale-acl.json — pasteable Tailscale ACL implementing ADR-0005, with tagOwners, two groups, default-deny ACL for tag:memory-host:7078, admin SSH on :22, and regression tests. - deploy/team-memory/operator-setup.md — Hetzner Cloud Firewall recipe (Console + hcloud CLI), Tailscale ACL pointer, and 1Password vault item schema with Bitwarden / Vault / Doppler equivalents. - deploy/team-memory/README.md §1/§2/§3 link to the new artefacts so the runbook stays the one entry point during the spike.
…ty.md Phase 1 commit 485bb9a stripped two trailing spaces from the auto-generated catalog header lines; CI II1 (Secret-pattern doc drift) flagged it. Re-ran npm run docs:secrets to put them back. Catalog content unchanged (still v1.0.0, 27 patterns, 17 providers).
…ity step
Two issues surfaced by the dry-run: ':latest' didn't exist (workflow line 69
reserves it for v* git tags, no release shipped yet) and the package was
private (HTTP 401 on anonymous pulls). Operator decision: keep ':latest'
convention intact, default to ':main' for the spike, set the package public.
- .env.example: MEMORY_IMAGE_TAG=main (was: latest), with comment block
pointing at sha-tags for production reproducibility and v* tags as the
future ':latest' source.
- docker-compose.yml: fallback ${MEMORY_IMAGE_TAG:-main} (was: -latest).
- operator-setup.md §4 (new): one-time GitHub UI step to flip the GHCR
package to public, plus the available-tag matrix and revert path.
- README.md §5: one-time prerequisite note pointing at operator-setup §4.
- team-memory-dryrun-results.md Finding 1: marked RESOLVED with the
investigation table and the chosen path.
- team-memory-spike-notes.md header: dry-run pointer reflects the
resolution, not the original papercut.
No workflow change. Verified: docker compose config now resolves to
ghcr.io/event4u-app/agent-memory:main; npm run check:links clean (218 ✓).
CLI `memory propose` cannot attach evidence — only ingestion scanners and `mcp.memory_ingest` from an agent context can. So the previous smoketest hit two trust-pipeline floors: 1. `--impact normal` requires MIN_EVIDENCE_COUNT=1 → rejected at promotion with `evidence_floor`. 2. Even after promotion, zero-evidence entries floor at trust 0.2 (src/trust/scoring.ts:27), below both default (0.6) and low-trust (0.3) retrieval thresholds. Fix: smoketest now uses --impact low (MIN_EVIDENCE_COUNT=0), verifies the entry via `memory verify`, and asserts retrieve indexes it as a candidate (totalCandidates >= 1) instead of expecting a surfaced result (filtering CLI-only entries is correct behaviour). Adds a `memory health` probe as a third check. Re-validated against the local Compose stack: 8/8 ✓. Updates dryrun-results.md Finding 2 with the technically-accurate explanation (CLI cannot attach evidence at all — \`--file\`/\`--scenario\` populate scope, not evidence) and clarifies that real Phase-2 spike Step-4 round-trip must drive through mcp.memory_ingest from an agent. Co-authored-by: Mathias Berg <noreply@example.com>
One-page sequence for the Phase 2 spike — short commands per step, links to the long-form runbook (README.md) and the manual setup pieces (operator-setup.md). Covers Day 0 prep work (no spend) + Day 1 provisioning + 'if something goes wrong' table. README §preamble adds a pointer to the cheat-sheet so the entry point to the runbook is the operator's choice (verbose vs scannable).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes Phase 1–3 of
agents/roadmaps/team-memory-deployment.md— the roadmap that turns the per-developer local memory into a single shared brain for the Galawork team, without any package code change. All decisions are recorded as ADRs; deployment artefacts and consumer docs land here so Phase 2 (spike) can start on top of a reviewed baseline.Phase 1 — Decisions (ADRs
Accepted)pg_dumpto a Hetzner Storage Box. Total ≈ €8.82/month (well under the €25 ceiling).MEMORY_MCP_AUTH_TOKENbearer. Defense in depth; SSO offboarding via Tailscale group..agent-memory.ymlomitsrepository:. Per-entryscope.repositoryprovenance preserved. Any developer may runmemory promote; the existing trust pipeline is the V1 quality gate.Phase 2 — Deploy scaffolding
deploy/team-memory/docker-compose.yml— pgvector/pgvector:pg17 + agent-memory in SSE mode, port bound toTAILNET_IPonly (never0.0.0.0).deploy/team-memory/.env.example— requiredPOSTGRES_PASSWORD,MEMORY_MCP_AUTH_TOKEN,TAILNET_IP.deploy/team-memory/README.md— end-to-end runbook (provision → Tailscale → deploy → verify → backups → restore drill → consumer onboarding pointer).agents/analysis/team-memory-spike-notes.md— spike log template (acceptance checks, daily entries, cost + latency reality checks, restore-drill preview, sign-off decision).Phase 3 — Consumer-side documentation + onboarding
docs/consumer-setup-docker-sidecar.md§4 — full team-memory remote-mode reference (1Password-backed bearer fetch, SSE MCP client config,/ssecurl health probe, team-brain.agent-memory.yml, troubleshooting rows).docs/consumer-setup-generic.md— Pattern C: MCP over SSE (shared brain).docs/consumer-setup-node.md§4 — SSE alternative for the team-brain case.scripts/team-memory-onboard.sh— read-only developer helper that checks Tailscale, brain DNS, 1Password bearer fetch,/ssehandshake; prints copy-pasteable shell exports.Out of scope (tracked, not in this PR)
.agent-memory.ymlin its own PR) and CImemory doctorintegration.Constraint compliance
src/changes — every artefact composes from existing CLI/MCP primitives (memory mcp --transport sse, repository-filter omission, promotion gate).npm run check:links(218 successful, 0 errors).docs/secret-safety.mdextended with the policy floor for shared deployments (PII, customer data, production logs, opinions — never permitted regardless of pattern catalog).Verification
The onboarding script's failure paths exercised on a non-tailnet host produce the expected red checks and a non-zero exit.
Reviewer focus
deploy/team-memory/docker-compose.yml— is theTAILNET_IP:7078:7078port binding enforcement sufficient, or do we need a host-level firewall layer?docs/consumer-setup-docker-sidecar.md§4 — does the bearer-fetch flow match how the team actually distributes secrets?Co-authored by Augment Code