From e5e3c9d899d100eda378d7995c6997fc42eaf368 Mon Sep 17 00:00:00 2001
From: "Michael K. Alber" <michaelkalber@proton.me>
Date: Sun, 19 Apr 2026 17:26:31 -0700
Subject: [PATCH] docs(context): onboard project into Four Disciplines
 framework

Add intent.md, constraints.md, evals.md. Rewrite AGENTS.md and CLAUDE.md
to follow the Nate B. Jones Four Disciplines template structure: project
overview, tech stack with versions, architecture, key files, domain
concepts, persistent decisions, open loops, available tools, and boot
ritual. Remove content that duplicates global standards files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
 constraints.md | 61 +++++++++++++++++++++++++++++++++
 evals.md       | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++
 intent.md      | 90 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 242 insertions(+)
 create mode 100644 constraints.md
 create mode 100644 evals.md
 create mode 100644 intent.md

diff --git a/constraints.md b/constraints.md
new file mode 100644
index 0000000..72be1ea
--- /dev/null
+++ b/constraints.md
@@ -0,0 +1,61 @@
+# second-brain — Constraints
+
+---
+
+## Must Do
+
+- Load and confirm context (`AGENTS.md`, `intent.md`, `constraints.md`) before every session.
+- Write a failing test before any production code — no exceptions (RED → GREEN → REFACTOR).
+- Run the full backend quality-check pass before any commit: pytest + ruff + mypy + bandit.
+- Sanitize all Tiptap/rich-text editor output before storing to the database.
+- Use `async/await` for every database operation — no synchronous SQLAlchemy calls.
+- Write three verifiable acceptance criteria before delegating any significant subtask.
+- Add a `# VERIFY:` comment rather than guess a function signature, API behavior, or SQLAlchemy idiom.
+- Confirm understanding before any destructive migration (column drop, rename, table drop).
+
+---
+
+## Must NOT Do
+
+- Do not write production code without a failing test. RED phase is local only — never committed.
+- Do not use synchronous SQLAlchemy anywhere in the application code.
+- Do not add a repository layer unless two or more services share identical query logic.
+- Do not exceed 15 lines in a route handler — delegate to `services/`.
+- Do not reuse a Pydantic schema across semantically different use cases.
+- Do not hardcode secrets, tokens, or `DATABASE_URL` values — use environment variables.
+- Do not commit `backend/basb.db`, `.env`, `.env.*`, or `__pycache__/`.
+- Do not re-litigate decisions logged in the Persistent Decisions tables without surfacing the question first.
+- Do not implement AI/LLM features without first resolving the Open Loop on AI augmentation strategy.
+
+---
+
+## Preferences
+
+- Prefer brevity over completeness unless depth is explicitly requested.
+- Prefer editing an existing file over creating a new one.
+- Prefer `grounded-code-mcp` knowledge base over training data for FastAPI, Vue 3, SQLAlchemy, and Pydantic v2 idioms.
+- Prefer a single focused service-layer unit test over a broad integration test when testing business logic.
+- Prefer `Mapped[Optional[T]]` with an explicit `None` default over omitting the default for nullable columns.
+- Prefer inline `# VERIFY:` annotations over guessing async patterns or SQLAlchemy 2.0 syntax.
+
+---
+
+## Escalate Rather Than Decide
+
+- Any AI/LLM feature — confirm provider, model name, cost, and integration point before implementing.
+- Any DB schema migration that drops or renames a column (hard to reverse).
+- Any CORS policy change that broadens allowed origins beyond `localhost`.
+- Any change that moves business logic out of `services/` (violates the layered architecture decision).
+- Any security-relevant decision not explicitly covered by these constraints.
+
+---
+
+## Code Quality Gates
+
+- **Test coverage (backend business logic):** ≥ 80% — `cd backend && .venv/bin/pytest --cov=app --cov-report=term-missing`
+- **Test coverage (frontend):** ≥ 70% — `cd frontend && npm run test:run`
+- **Test coverage (security-critical paths):** ≥ 95%
+- **Cyclomatic complexity (per method):** < 10
+- **Code duplication:** ≤ 3%
+- **Commit format:** Conventional Commits — `feat:`, `fix:`, `refactor:`, `chore:`, `test:`, `docs:`
+- **Commit scope:** Atomic — one logical change per commit; RED phase never committed
diff --git a/evals.md b/evals.md
new file mode 100644
index 0000000..e6f2f1d
--- /dev/null
+++ b/evals.md
@@ -0,0 +1,91 @@
+# second-brain — Evals
+
+---
+
+## Eval Philosophy
+
+Evals are safety infrastructure, not a finishing step. Write them before the agent starts.
+A passing test suite ≠ done; tests verify code correctness, evals verify output is actually
+good relative to BASB intent.
+
+A passing eval is measurable, repeatable, and would survive scrutiny from a developer who
+understands Tiago Forte's BASB methodology and expects the domain semantics to be respected.
+
+---
+
+## Test Cases
+
+### Test Case 1: Note Capture and PARA Move
+
+- **Input / Prompt:** "Implement the endpoint to move a note to a PARA container and advance its CODE stage."
+- **Known-Good Output:** `PATCH /api/v1/notes/{id}/move` sets `container_id` on the note AND advances `code_stage` from `capture` to `organize`. Route handler ≤15 lines. Business logic in `note_service.py`. Integration test covers happy path and 404 on missing note.
+- **Pass Criteria:**
+  - [ ] `code_stage` transitions from `capture` → `organize` on move (BASB semantic — not just a field update)
+  - [ ] `container_id` is set to the target container
+  - [ ] Route handler is ≤15 lines; logic is in `services/note_service.py`
+  - [ ] Integration test in `tests/integration/test_notes_api.py` covers: success (200), missing note (404), missing container (404)
+  - [ ] `pytest` passes with coverage ≥ 80%
+- **Last Run:** — | **Result:** —
+- **Notes:** —
+
+---
+
+### Test Case 2: Progressive Summarization Highlight Update
+
+- **Input / Prompt:** "Implement `PATCH /api/v1/notes/{id}/highlights` to update L2/L3 highlight ranges."
+- **Known-Good Output:** Endpoint accepts `{"highlights": [{"start": int, "end": int, "layer": 2|3}]}`. Validates layer is 2 or 3 only. Replaces the full highlight list (not appends). Service validates that ranges don't exceed content length. Unit test in `tests/unit/` covers validation; integration test covers the HTTP contract.
+- **Pass Criteria:**
+  - [ ] Layer values other than 2 or 3 return 422 (Pydantic validation, not a manual check)
+  - [ ] L4 (executive summary) is NOT updated by this endpoint — separate field/endpoint
+  - [ ] Highlight ranges are replaced atomically (not merged with existing)
+  - [ ] Service-layer unit test covers: valid payload, invalid layer, empty list (clears all)
+  - [ ] `pytest` passes; no ruff or mypy errors introduced
+- **Last Run:** — | **Result:** —
+- **Notes:** Highlight offset drift (content edited after highlights set) is a known open issue — not in scope here.
+
+---
+
+### Test Case 3: Full-Text Search
+
+- **Input / Prompt:** "Implement `GET /api/v1/search?q=` for full-text search across note titles and content."
+- **Known-Good Output:** Returns notes where title OR content contains the query string (case-insensitive). Empty `q` returns 422. Results include `code_stage` and `container_id`. Implemented in `search_service.py`. Integration test covers: match in title, match in content, no match, empty query.
+- **Pass Criteria:**
+  - [ ] Search is case-insensitive
+  - [ ] Empty `q` returns 422 (not an empty list)
+  - [ ] Response schema includes `code_stage` and `container_id` for each result
+  - [ ] Logic is in `search_service.py`, not in the route handler
+  - [ ] Integration tests cover all four cases above
+- **Last Run:** — | **Result:** —
+- **Notes:** SQLite `LIKE` is acceptable for now; defer FTS5 until the Open Loop on deployment target is resolved.
+
+---
+
+## Taste Rules (Encoded Rejections)
+
+| # | Pattern to Reject | Why It Fails | Rule |
+|---|---|---|---|
+| 1 | Moving a note to a container without updating `code_stage` | Technically correct HTTP but wrong BASB semantics — capture stays capture forever | A move operation MUST advance `code_stage` to `organize` |
+| 2 | Putting query logic directly in route handlers | Defeats the layered architecture; untestable in isolation | All DB access goes through `services/`; routes call services only |
+| 3 | Reusing `NoteResponse` schema as the input schema for update | Semantically wrong; leaks read-only fields into writes | One schema per use case: `CreateNote`, `UpdateNote`, `NoteResponse` are all distinct |
+| 4 | AI feature implemented without surfacing provider/cost question | Binds the project to an unconfirmed external dependency | Escalate AI integration to human before writing any model call code |
+
+---
+
+## CI Gate
+
+The agent must not declare a task complete if any gate below fails.
+
+- **Backend tests:** `cd backend && .venv/bin/pytest -v --cov=app --cov-report=term-missing` — all pass, coverage ≥ 80%
+- **Backend lint:** `.venv/bin/ruff check app/ tests/` — zero errors
+- **Backend format:** `.venv/bin/ruff format --check app/ tests/` — clean
+- **Backend types:** `.venv/bin/mypy app/` — zero errors
+- **Backend security:** `.venv/bin/bandit -r app/ -c pyproject.toml` — zero high/critical
+- **Dependency audit:** `.venv/bin/pip-audit --skip-editable` — zero known vulnerabilities
+- **Frontend tests:** `cd frontend && npm run test:run` — all pass
+- **Frontend build:** `npm run build` — zero errors
+
+---
+
+## Rejection Log
+
+*(Append entries here as outputs are rejected. Never delete entries.)*
diff --git a/intent.md b/intent.md
new file mode 100644
index 0000000..1899d3b
--- /dev/null
+++ b/intent.md
@@ -0,0 +1,90 @@
+# second-brain — Intent
+
+---
+
+## Agent Architecture
+
+**This project uses:** Coding harness
+
+**Reason:** Solo developer with human review at every step; task-level features and bug fixes do not require autonomous multi-session loops.
+
+---
+
+## Primary Goal
+
+A fully functional AI-augmented personal KMS: the user captures raw notes, organizes them into the PARA taxonomy, and progressively distills them from L1 raw text to L4 executive summary — with AI assistance accelerating the distill and express stages of the CODE workflow.
+
+---
+
+## Values (What We Optimize For)
+
+1. **Correctness** — code accurately implements BASB semantics; no data loss or corruption
+2. **Security** — user content is sanitized, stored safely, and never leaked
+3. **Maintainability** — readable, tested code a solo developer can return to after weeks away
+4. **Performance** — async throughout; UI responses feel immediate
+5. **Speed of delivery** — last priority; correctness is never sacrificed for pace
+
+---
+
+## Tradeoff Rules
+
+| Conflict | Resolution |
+|---|---|
+| Speed vs. correctness | Default to correctness. Flag explicitly if timeline requires compromise. |
+| Completeness vs. brevity | Prefer brevity unless depth is explicitly requested. |
+| New abstraction vs. duplication | Tolerate duplication until the third occurrence; then extract. |
+| AI feature richness vs. scope creep | Confirm AI integration points before implementing; see Open Loops in AGENTS.md. |
+
+---
+
+## Decision Boundaries
+
+### Decide Autonomously
+
+- Formatting, structure, naming within established project conventions
+- Tool selection for read-only exploration
+- Refactoring within an approved, scoped task
+- Choosing between two equivalent async SQLAlchemy patterns
+- Adding a test for an untested code path discovered during a task
+
+### Escalate to Human
+
+- Any AI feature that touches external APIs — confirm provider, model name, and cost before implementing
+- Any DB schema migration that drops or renames a column
+- Any CORS policy change that broadens allowed origins beyond `localhost`
+- Any change that moves business logic out of `services/` and into routes or models
+- Any output intended for external distribution
+- Any irreversible action (delete, force-push, send)
+- Scope changes beyond the stated task
+- When acceptance criteria cannot be met within stated constraints
+
+---
+
+## What "Good" Looks Like
+
+A good output for this project:
+
+- Implements the BASB concept correctly (not just the literal endpoint spec) — e.g., a "move" operation correctly advances the `code_stage`
+- Produces working, tested code on the first attempt within the defined scope
+- Stays thin at the route layer and puts logic in services — verifiable by line count
+- Uses the domain vocabulary (`CodeStage`, `ContainerType`, `highlights`) consistently
+- Flags risks (schema changes, async pitfalls, highlight offset drift) proactively
+
+---
+
+## Anti-Patterns (What Bad Looks Like)
+
+- Implementing the literal request while missing the BASB intent (e.g., moving a note to a container without updating `code_stage`)
+- Adding a repository layer or other abstraction "for future flexibility" — YAGNI
+- Synchronous SQLAlchemy calls that block the event loop
+- Reusing a Pydantic schema across semantically different operations to save lines
+- Recommending an AI feature without surfacing the provider/cost/integration question first
+
+---
+
+## Persistent Decisions
+
+| Date | Decision | Rationale |
+|---|---|---|
+| [VERIFY: date] | L4 executive summary is always user-authored | AI may suggest, but the user's own words are the point of the express stage |
+| [VERIFY: date] | Inbox = notes with no `container_id` | Simple; avoids a separate inbox table |