From e5e3c9d899d100eda378d7995c6997fc42eaf368 Mon Sep 17 00:00:00 2001 From: "Michael K. Alber" Date: Sun, 19 Apr 2026 17:26:31 -0700 Subject: [PATCH] docs(context): onboard project into Four Disciplines framework Add intent.md, constraints.md, evals.md. Rewrite AGENTS.md and CLAUDE.md to follow the Nate B. Jones Four Disciplines template structure: project overview, tech stack with versions, architecture, key files, domain concepts, persistent decisions, open loops, available tools, and boot ritual. Remove content that duplicates global standards files. Co-Authored-By: Claude Sonnet 4.6 --- constraints.md | 61 +++++++++++++++++++++++++++++++++ evals.md | 91 ++++++++++++++++++++++++++++++++++++++++++++++++++ intent.md | 90 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 242 insertions(+) create mode 100644 constraints.md create mode 100644 evals.md create mode 100644 intent.md diff --git a/constraints.md b/constraints.md new file mode 100644 index 0000000..72be1ea --- /dev/null +++ b/constraints.md @@ -0,0 +1,61 @@ +# second-brain — Constraints + +--- + +## Must Do + +- Load and confirm context (`AGENTS.md`, `intent.md`, `constraints.md`) before every session. +- Write a failing test before any production code — no exceptions (RED → GREEN → REFACTOR). +- Run the full backend quality-check pass before any commit: pytest + ruff + mypy + bandit. +- Sanitize all Tiptap/rich-text editor output before storing to the database. +- Use `async/await` for every database operation — no synchronous SQLAlchemy calls. +- Write three verifiable acceptance criteria before delegating any significant subtask. +- Add a `# VERIFY:` comment rather than guess a function signature, API behavior, or SQLAlchemy idiom. +- Confirm understanding before any destructive migration (column drop, rename, table drop). + +--- + +## Must NOT Do + +- Do not write production code without a failing test. RED phase is local only — never committed. +- Do not use synchronous SQLAlchemy anywhere in the application code. +- Do not add a repository layer unless two or more services share identical query logic. +- Do not exceed 15 lines in a route handler — delegate to `services/`. +- Do not reuse a Pydantic schema across semantically different use cases. +- Do not hardcode secrets, tokens, or `DATABASE_URL` values — use environment variables. +- Do not commit `backend/basb.db`, `.env`, `.env.*`, or `__pycache__/`. +- Do not re-litigate decisions logged in the Persistent Decisions tables without surfacing the question first. +- Do not implement AI/LLM features without first resolving the Open Loop on AI augmentation strategy. + +--- + +## Preferences + +- Prefer brevity over completeness unless depth is explicitly requested. +- Prefer editing an existing file over creating a new one. +- Prefer `grounded-code-mcp` knowledge base over training data for FastAPI, Vue 3, SQLAlchemy, and Pydantic v2 idioms. +- Prefer a single focused service-layer unit test over a broad integration test when testing business logic. +- Prefer `Mapped[Optional[T]]` with an explicit `None` default over omitting the default for nullable columns. +- Prefer inline `# VERIFY:` annotations over guessing async patterns or SQLAlchemy 2.0 syntax. + +--- + +## Escalate Rather Than Decide + +- Any AI/LLM feature — confirm provider, model name, cost, and integration point before implementing. +- Any DB schema migration that drops or renames a column (hard to reverse). +- Any CORS policy change that broadens allowed origins beyond `localhost`. +- Any change that moves business logic out of `services/` (violates the layered architecture decision). +- Any security-relevant decision not explicitly covered by these constraints. + +--- + +## Code Quality Gates + +- **Test coverage (backend business logic):** ≥ 80% — `cd backend && .venv/bin/pytest --cov=app --cov-report=term-missing` +- **Test coverage (frontend):** ≥ 70% — `cd frontend && npm run test:run` +- **Test coverage (security-critical paths):** ≥ 95% +- **Cyclomatic complexity (per method):** < 10 +- **Code duplication:** ≤ 3% +- **Commit format:** Conventional Commits — `feat:`, `fix:`, `refactor:`, `chore:`, `test:`, `docs:` +- **Commit scope:** Atomic — one logical change per commit; RED phase never committed diff --git a/evals.md b/evals.md new file mode 100644 index 0000000..e6f2f1d --- /dev/null +++ b/evals.md @@ -0,0 +1,91 @@ +# second-brain — Evals + +--- + +## Eval Philosophy + +Evals are safety infrastructure, not a finishing step. Write them before the agent starts. +A passing test suite ≠ done; tests verify code correctness, evals verify output is actually +good relative to BASB intent. + +A passing eval is measurable, repeatable, and would survive scrutiny from a developer who +understands Tiago Forte's BASB methodology and expects the domain semantics to be respected. + +--- + +## Test Cases + +### Test Case 1: Note Capture and PARA Move + +- **Input / Prompt:** "Implement the endpoint to move a note to a PARA container and advance its CODE stage." +- **Known-Good Output:** `PATCH /api/v1/notes/{id}/move` sets `container_id` on the note AND advances `code_stage` from `capture` to `organize`. Route handler ≤15 lines. Business logic in `note_service.py`. Integration test covers happy path and 404 on missing note. +- **Pass Criteria:** + - [ ] `code_stage` transitions from `capture` → `organize` on move (BASB semantic — not just a field update) + - [ ] `container_id` is set to the target container + - [ ] Route handler is ≤15 lines; logic is in `services/note_service.py` + - [ ] Integration test in `tests/integration/test_notes_api.py` covers: success (200), missing note (404), missing container (404) + - [ ] `pytest` passes with coverage ≥ 80% +- **Last Run:** — | **Result:** — +- **Notes:** — + +--- + +### Test Case 2: Progressive Summarization Highlight Update + +- **Input / Prompt:** "Implement `PATCH /api/v1/notes/{id}/highlights` to update L2/L3 highlight ranges." +- **Known-Good Output:** Endpoint accepts `{"highlights": [{"start": int, "end": int, "layer": 2|3}]}`. Validates layer is 2 or 3 only. Replaces the full highlight list (not appends). Service validates that ranges don't exceed content length. Unit test in `tests/unit/` covers validation; integration test covers the HTTP contract. +- **Pass Criteria:** + - [ ] Layer values other than 2 or 3 return 422 (Pydantic validation, not a manual check) + - [ ] L4 (executive summary) is NOT updated by this endpoint — separate field/endpoint + - [ ] Highlight ranges are replaced atomically (not merged with existing) + - [ ] Service-layer unit test covers: valid payload, invalid layer, empty list (clears all) + - [ ] `pytest` passes; no ruff or mypy errors introduced +- **Last Run:** — | **Result:** — +- **Notes:** Highlight offset drift (content edited after highlights set) is a known open issue — not in scope here. + +--- + +### Test Case 3: Full-Text Search + +- **Input / Prompt:** "Implement `GET /api/v1/search?q=` for full-text search across note titles and content." +- **Known-Good Output:** Returns notes where title OR content contains the query string (case-insensitive). Empty `q` returns 422. Results include `code_stage` and `container_id`. Implemented in `search_service.py`. Integration test covers: match in title, match in content, no match, empty query. +- **Pass Criteria:** + - [ ] Search is case-insensitive + - [ ] Empty `q` returns 422 (not an empty list) + - [ ] Response schema includes `code_stage` and `container_id` for each result + - [ ] Logic is in `search_service.py`, not in the route handler + - [ ] Integration tests cover all four cases above +- **Last Run:** — | **Result:** — +- **Notes:** SQLite `LIKE` is acceptable for now; defer FTS5 until the Open Loop on deployment target is resolved. + +--- + +## Taste Rules (Encoded Rejections) + +| # | Pattern to Reject | Why It Fails | Rule | +|---|---|---|---| +| 1 | Moving a note to a container without updating `code_stage` | Technically correct HTTP but wrong BASB semantics — capture stays capture forever | A move operation MUST advance `code_stage` to `organize` | +| 2 | Putting query logic directly in route handlers | Defeats the layered architecture; untestable in isolation | All DB access goes through `services/`; routes call services only | +| 3 | Reusing `NoteResponse` schema as the input schema for update | Semantically wrong; leaks read-only fields into writes | One schema per use case: `CreateNote`, `UpdateNote`, `NoteResponse` are all distinct | +| 4 | AI feature implemented without surfacing provider/cost question | Binds the project to an unconfirmed external dependency | Escalate AI integration to human before writing any model call code | + +--- + +## CI Gate + +The agent must not declare a task complete if any gate below fails. + +- **Backend tests:** `cd backend && .venv/bin/pytest -v --cov=app --cov-report=term-missing` — all pass, coverage ≥ 80% +- **Backend lint:** `.venv/bin/ruff check app/ tests/` — zero errors +- **Backend format:** `.venv/bin/ruff format --check app/ tests/` — clean +- **Backend types:** `.venv/bin/mypy app/` — zero errors +- **Backend security:** `.venv/bin/bandit -r app/ -c pyproject.toml` — zero high/critical +- **Dependency audit:** `.venv/bin/pip-audit --skip-editable` — zero known vulnerabilities +- **Frontend tests:** `cd frontend && npm run test:run` — all pass +- **Frontend build:** `npm run build` — zero errors + +--- + +## Rejection Log + +*(Append entries here as outputs are rejected. Never delete entries.)* diff --git a/intent.md b/intent.md new file mode 100644 index 0000000..1899d3b --- /dev/null +++ b/intent.md @@ -0,0 +1,90 @@ +# second-brain — Intent + +--- + +## Agent Architecture + +**This project uses:** Coding harness + +**Reason:** Solo developer with human review at every step; task-level features and bug fixes do not require autonomous multi-session loops. + +--- + +## Primary Goal + +A fully functional AI-augmented personal KMS: the user captures raw notes, organizes them into the PARA taxonomy, and progressively distills them from L1 raw text to L4 executive summary — with AI assistance accelerating the distill and express stages of the CODE workflow. + +--- + +## Values (What We Optimize For) + +1. **Correctness** — code accurately implements BASB semantics; no data loss or corruption +2. **Security** — user content is sanitized, stored safely, and never leaked +3. **Maintainability** — readable, tested code a solo developer can return to after weeks away +4. **Performance** — async throughout; UI responses feel immediate +5. **Speed of delivery** — last priority; correctness is never sacrificed for pace + +--- + +## Tradeoff Rules + +| Conflict | Resolution | +|---|---| +| Speed vs. correctness | Default to correctness. Flag explicitly if timeline requires compromise. | +| Completeness vs. brevity | Prefer brevity unless depth is explicitly requested. | +| New abstraction vs. duplication | Tolerate duplication until the third occurrence; then extract. | +| AI feature richness vs. scope creep | Confirm AI integration points before implementing; see Open Loops in AGENTS.md. | + +--- + +## Decision Boundaries + +### Decide Autonomously + +- Formatting, structure, naming within established project conventions +- Tool selection for read-only exploration +- Refactoring within an approved, scoped task +- Choosing between two equivalent async SQLAlchemy patterns +- Adding a test for an untested code path discovered during a task + +### Escalate to Human + +- Any AI feature that touches external APIs — confirm provider, model name, and cost before implementing +- Any DB schema migration that drops or renames a column +- Any CORS policy change that broadens allowed origins beyond `localhost` +- Any change that moves business logic out of `services/` and into routes or models +- Any output intended for external distribution +- Any irreversible action (delete, force-push, send) +- Scope changes beyond the stated task +- When acceptance criteria cannot be met within stated constraints + +--- + +## What "Good" Looks Like + +A good output for this project: + +- Implements the BASB concept correctly (not just the literal endpoint spec) — e.g., a "move" operation correctly advances the `code_stage` +- Produces working, tested code on the first attempt within the defined scope +- Stays thin at the route layer and puts logic in services — verifiable by line count +- Uses the domain vocabulary (`CodeStage`, `ContainerType`, `highlights`) consistently +- Flags risks (schema changes, async pitfalls, highlight offset drift) proactively + +--- + +## Anti-Patterns (What Bad Looks Like) + +- Implementing the literal request while missing the BASB intent (e.g., moving a note to a container without updating `code_stage`) +- Adding a repository layer or other abstraction "for future flexibility" — YAGNI +- Synchronous SQLAlchemy calls that block the event loop +- Reusing a Pydantic schema across semantically different operations to save lines +- Recommending an AI feature without surfacing the provider/cost/integration question first + +--- + +## Persistent Decisions + +| Date | Decision | Rationale | +|---|---|---| +| [VERIFY: date] | L4 executive summary is always user-authored | AI may suggest, but the user's own words are the point of the express stage | +| [VERIFY: date] | Inbox = notes with no `container_id` | Simple; avoids a separate inbox table |