diff --git a/skills/build-unplanned-feature/SKILL.md b/skills/build-unplanned-feature/SKILL.md
index 6bb7c2e..745d64f 100644
--- a/skills/build-unplanned-feature/SKILL.md
+++ b/skills/build-unplanned-feature/SKILL.md
@@ -19,12 +19,30 @@ Extract the feature description from:
 
 Store the raw description for clarification.
 
-### Step 2: Clarify Requirements
+### Step 2: Load Existing Specs
+
+Before clarifying requirements, check for existing project specs so the clarification can detect contradictions and the implementation can follow established patterns.
+
+**Check for and read (if they exist):**
+- `specs/product_specs.md` (or `specs/product_specs/` directory) → `PRD_CONTEXT`
+- `specs/architecture.md` (or `specs/architecture/` directory) → `ARCHITECTURE_CONTEXT`
+- `specs/design_system.md` → `DESIGN_CONTEXT`
+
+For each, check single file first, then directory. If a directory, aggregate all `.md` files.
+
+If none exist, that's fine — proceed without them.
+
+### Step 3: Clarify Requirements
 
 **You MUST call the Skill tool now:** `Skill(skill="groundwork:understanding-feature-requests")`
 
 Do NOT attempt to gather requirements yourself. The skill handles this.
 
+If existing specs were loaded in Step 2, provide them as context to the clarification skill so it can:
+- Detect contradictions with existing PRD requirements
+- Identify overlap with existing features
+- Understand architectural constraints
+
 Follow the skill to gather:
 - Problem being solved
 - Target user/persona
@@ -33,7 +51,7 @@ Follow the skill to gather:
 
 Continue until requirements are clear and internally consistent.
 
-### Step 3: Generate Feature Identifier
+### Step 4: Generate Feature Identifier
 
 Create a feature identifier from the clarified requirements:
 
@@ -49,7 +67,7 @@ Create a feature identifier from the clarified requirements:
 - "Export reports to PDF" → `FEATURE-pdf-export`
 - "Rate limiting for API" → `FEATURE-api-rate-limit`
 
-### Step 4: Present Feature Summary
+### Step 5: Present Feature Summary
 
 Present summary to the user:
 
@@ -84,11 +102,11 @@ Then use `AskUserQuestion` to ask:
 
 **Wait for user response before proceeding.**
 
-### Step 5: Implementation (task-executor Agent)
+### Step 6: Implementation (task-executor Agent)
 
 Implementation is dispatched to the **task-executor agent** with a fresh context window. This agent has `implement-feature`, `use-git-worktree`, and `test-driven-development` skills preloaded — it does not need to call `Skill()` or spawn subagents.
 
-**Build the Task prompt with ALL gathered context from Steps 1-4.** You MUST include actual values, not placeholders:
+**Build the Task prompt with ALL gathered context from Steps 1-5 (specs from Step 2, requirements from Step 3, feature definition from Steps 4-5).** You MUST include actual values, not placeholders:
 
     Task(
       subagent_type="groundwork:task-executor:task-executor",
@@ -98,22 +116,35 @@ Implementation is dispatched to the **task-executor agent** with a fresh context
     PROJECT ROOT: [absolute path to project root]
 
     FEATURE DEFINITION:
-    - Identifier: [FEATURE-slug from Step 3]
-    - Title: [1-2 sentence summary from Step 4]
+    - Identifier: [FEATURE-slug from Step 4]
+    - Title: [1-2 sentence summary from Step 5]
 
     ACTION ITEMS:
-    [Bulleted list of requirements gathered in Steps 1-2]
+    [Bulleted list of requirements gathered in Steps 1-3]
 
     ACCEPTANCE CRITERIA:
-    [Bulleted list of acceptance criteria from Step 2/4]
+    [Bulleted list of acceptance criteria from Step 3/5]
 
     OUT OF SCOPE:
     [Bulleted list of exclusions, or 'None specified']
 
+    PROJECT CONTEXT (follow these established patterns):
+    [Include each section below ONLY if the spec was found in Step 2. Omit sections where no spec exists.]
+
+    ARCHITECTURE:
+    [Contents of ARCHITECTURE_CONTEXT — pay attention to technology choices, component boundaries, and decision records. Your implementation must follow these.]
+
+    DESIGN SYSTEM:
+    [Contents of DESIGN_CONTEXT — use these design tokens, patterns, and component styles. Do not invent new patterns that contradict the established system.]
+
+    EXISTING PRD (for reference only):
+    [Contents of PRD_CONTEXT — be aware of existing features to avoid duplication or contradiction. Do not re-implement existing functionality.]
+
     INSTRUCTIONS:
     1. Follow your preloaded skills to create a worktree, implement with TDD, and commit.
     2. The feature definition above provides all session context — do NOT re-ask the user for requirements.
-    3. When complete, output your final line in EXACTLY this format:
+    3. If project context is provided, follow the established architecture and design patterns.
+    4. When complete, output your final line in EXACTLY this format:
        RESULT: IMPLEMENTED | <worktree_path> | <branch> | <base_branch>
        OR:
        RESULT: FAILURE | [one-line reason]
@@ -130,11 +161,11 @@ Implementation is dispatched to the **task-executor agent** with a fresh context
 - `RESULT: FAILURE | ...` — Report the failure and worktree location for investigation, stop
 - No parseable RESULT line — Report: "Implementation subagent did not return a structured result. Check worktree status manually." Stop.
 
-### Step 6: Validation (Direct Skill Call)
+### Step 7: Validation (Direct Skill Call)
 
 **Call the validation-loop skill directly.** Do NOT wrap this in a subagent — this skill runs in the main conversation, which CAN spawn the 8 validation subagents it needs.
 
-1. `cd` into the worktree path from Step 5
+1. `cd` into the worktree path from Step 6
 2. Call: `Skill(skill='groundwork:validation-loop')`
 3. The validation-loop skill will run 8 verification agents in parallel and fix issues autonomously.
 
@@ -143,7 +174,7 @@ Implementation is dispatched to the **task-executor agent** with a fresh context
 - Validation failed → Report the failure and worktree location for investigation, stop
 - Stuck on recurring issue → Report the stuck finding and stop
 
-### Step 7: Merge Decision
+### Step 8: Merge Decision
 
 **From the project root** (NOT the worktree), handle merge:
 
@@ -191,7 +222,7 @@ cd .worktrees/<identifier>
 ```
 ```
 
-### Step 8: Report Completion
+### Step 9: Report Completion
 
 Output implementation summary:
 
diff --git a/skills/execute-task/SKILL.md b/skills/execute-task/SKILL.md
index 6566c79..150e5f9 100644
--- a/skills/execute-task/SKILL.md
+++ b/skills/execute-task/SKILL.md
@@ -74,7 +74,7 @@ Read the tasks file from the worktree:
 
 Search for `### TASK-NNN:` pattern.
 
-**Error:** Task not found → "TASK-NNN not found in specs/tasks.md"
+**Error:** Task not found → "TASK-NNN not found in specs/tasks/"
 
 ### Step 3: Validate Task is Workable
 
@@ -105,9 +105,11 @@ Search for `### TASK-NNN:` pattern.
 Read from the worktree:
 1. **Product specs** - `specs/product_specs.md` or `specs/product_specs/`
 2. **Architecture** - `specs/architecture.md` or `specs/architecture/`
-3. **Tasks** - `specs/tasks.md` or `specs/tasks/`
+3. **Design system** - `specs/design_system.md` (if exists)
+4. **Tasks** - `specs/tasks.md` or `specs/tasks/`
 
 **If specs missing:** Report which are missing and suggest commands to create them.
+**If design system missing:** Not an error — proceed without it. Note its absence for the planner.
 
 ### Step 5: Plan Implementation
 
@@ -129,10 +131,14 @@ Relevant product specs:
 Relevant architecture:
 [extracted decisions]
 
+Design system:
+[extracted design tokens, color palette, typography, component patterns, UX patterns — or 'No design system defined' if not found]
+
 REQUIREMENTS FOR THE PLAN:
 1. All work happens in worktree .worktrees/TASK-NNN (not main workspace)
 2. Must follow TDD: write test → implement → verify cycle
 3. Plan covers implementation only — validation and merge are handled separately by the caller
+4. If a design system is provided and the task involves UI, the plan must reference specific design tokens, colors, and component patterns from it
 ",
   description="Plan TASK-NNN"
 )
@@ -191,7 +197,7 @@ Present summary to the user:
 
 ### Step 7: Implementation (task-executor Agent)
 
-1. **Update status** - Change task to `**Status:** In Progress`
+1. **Update status** - Change task file to `**Status:** In Progress` and update the status table in `specs/tasks/_index.md` (change the task's row to `In Progress`)
 
 2. **Dispatch to the task-executor agent** with a fresh context window. This agent has `implement-feature`, `use-git-worktree`, and `test-driven-development` skills preloaded — it does not need to call `Skill()` or spawn subagents.
 
@@ -319,7 +325,7 @@ cd .worktrees/TASK-NNN
 
 After successful merge or user acknowledgment:
 
-1. **Update status** - Change task to `**Status:** Complete`
+1. **Update status** - Change task file to `**Status:** Complete` and update the status table in `specs/tasks/_index.md` (change the task's row to `Complete`)
 
 ### Step 9: Complete and Report
 
diff --git a/skills/implement-feature/SKILL.md b/skills/implement-feature/SKILL.md
index b9f0934..64d7191 100644
--- a/skills/implement-feature/SKILL.md
+++ b/skills/implement-feature/SKILL.md
@@ -77,6 +77,12 @@ For non-trivial changes, ask:
 
 If improvements identified: implement them (maintaining test coverage).
 
+### Step 4b: Frontend Visual Polish
+
+**For UI/frontend tasks**, apply visual polish before handing off. Read the design system spec (`specs/design_system.md`) if it exists.
+
+**Rule of thumb:** Every view should have at least one element that makes it visually distinctive. If everything is the same white card with the same border, it needs more visual variety.
+
 ### Step 5: Verify and Hand Off
 
 Verify all work before handing off:
diff --git a/skills/next-task/SKILL.md b/skills/next-task/SKILL.md
index 85910aa..cf01577 100644
--- a/skills/next-task/SKILL.md
+++ b/skills/next-task/SKILL.md
@@ -7,26 +7,34 @@ user-invocable: false
 
 # Next Task Skill
 
-Finds the next uncompleted task from `specs/tasks.md` and delegates execution to the `execute-task` skill.
+Finds the next uncompleted task from `specs/tasks/` (or `specs/tasks.md`) and delegates execution to the `execute-task` skill.
 
 ## Workflow
 
 **IMPORTANT**: Your job is NOT to build a plan or build anything, it is to exclusively to find the next task to execute. Don't do planning or building until the full workflow is executed. If you find yourself planning or executing, STOP and follow the workflow.
 
-### Step 1: Load Tasks
+### Step 1: Load Task Index
 
-Read the tasks file to find available tasks:
-- Single file: `specs/tasks.md`
-- Directory: `specs/tasks/` (aggregated in sorted order)
+Read **only** the task index to find available tasks — do NOT read individual task files:
+- Directory mode: `specs/tasks/_index.md` (preferred — contains status table)
+- Single file fallback: `specs/tasks.md`
 
-**Detection:** Check for file first (takes precedence), then directory. When reading a directory, aggregate all `.md` files recursively with `_index.md` first, then numerically-prefixed files, then alphabetically.
+**Detection:** Check for `specs/tasks/_index.md` first, then `specs/tasks.md`. Do NOT aggregate all task files — the index contains the status table with all the information needed to find the next task.
 
 ### Step 2: Find Next Task
 
-Parse the tasks to find the next task to work on:
+Parse the status table in `_index.md` to find the next task:
 
-1. Look for all tasks with `**Status:** Not Started`
-2. Check dependencies - filter to unblocked tasks (no incomplete dependencies)
+```markdown
+| # | Task | Milestone | Status | Blocked by |
+|---|------|-----------|--------|------------|
+| TASK-001 | Auth setup | M1 | Complete | None |
+| TASK-002 | Login UI | M1 | In Progress | TASK-001 |
+| TASK-003 | Dashboard | M2 | Not Started | TASK-002 |
+```
+
+1. Look for all rows with Status = `Not Started`
+2. Check dependencies — filter to unblocked tasks (all tasks in `Blocked by` column are `Complete`)
 3. **Detect ambiguity:**
    - Let `candidates` = unblocked not-started tasks
    - Let `next_sequential` = lowest numbered candidate (e.g., TASK-004)
@@ -45,7 +53,9 @@ Parse the tasks to find the next task to work on:
 
 5. **If unambiguous:** Select the first unblocked, not-started task
 
-**Dependency check:** A task is blocked if its `Blocked by:` field lists any task that is not `Complete`.
+**Dependency check:** A task is blocked if its `Blocked by` column lists any task that is not `Complete` in the status table.
+
+**Fallback:** If `_index.md` has no status table, read individual task files to find statuses (legacy single-file format).
 
 **Tip:** For direct task selection, use `/groundwork:work-on N` to work on a specific task by number.
 
@@ -55,7 +65,7 @@ Parse the tasks to find the next task to work on:
 |-----------|----------|
 | No `specs/` directory | "No specs directory found. Run `/groundwork:design-product` to start defining your project." |
 | Missing tasks file | "Tasks file not found. Run `/groundwork:create-tasks` to generate tasks from your PRD and architecture." |
-| No tasks in file | "No tasks found in specs/tasks.md. The file may need to be regenerated with `/groundwork:create-tasks`." |
+| No tasks found | "No tasks found in specs/tasks/. The directory may need to be regenerated with `/groundwork:create-tasks`." |
 | All tasks complete | "All tasks complete! Consider running `/groundwork:source-product-specs-from-code` to update documentation or plan the next phase." |
 | Only blocked tasks | "All remaining tasks are blocked. Blocked tasks: [list]. Would you like to override and work on one anyway?" |
 | Parse error | "Could not parse tasks.md. Expected format: `### TASK-NNN: Title` with `**Status:**` field." |
diff --git a/skills/product-design/SKILL.md b/skills/product-design/SKILL.md
index 551022a..a3b63d3 100644
--- a/skills/product-design/SKILL.md
+++ b/skills/product-design/SKILL.md
@@ -21,10 +21,27 @@ Interactive workflow for iteratively designing and documenting product requireme
 
 If the user called the skill without any context, ask them to provide context on what feature they want to add, modify or remove.
 
+## Step 0.5: Load Project Context
+
+Before clarifying the request, load existing specs so you can detect contradictions and understand constraints:
+
+**Check for and read (if they exist):**
+- `specs/architecture.md` (or `specs/architecture/` directory) — technology choices, component boundaries, decision records. Use to catch architecturally infeasible requirements early.
+- `specs/design_system.md` — design principles, brand decisions, UX patterns. Use to ensure new features align with established design language.
+- `specs/product_specs.md` (or `specs/product_specs/` directory) — already loaded implicitly by the contradiction check, but read it explicitly here for full cross-feature context.
+
+For each, check single file first, then directory. If a directory, aggregate all `.md` files.
+
+If none exist, that's fine — proceed without them.
+
 ## Step 1: Understand the Request
 
 Invoke the `groundwork:understanding-feature-requests` skill to clarify the feature and check for contradictions.
 
+If architecture and design system were loaded in Step 0.5, provide them as context so the clarification can also check for:
+- **Architectural feasibility** — requirements that conflict with technology choices or component boundaries
+- **Design consistency** — features that would require patterns contradicting the established design system
+
 This skill will:
 1. Ask clarifying questions (core, exploratory, conditional)
 2. Check for internal, cross-feature, and technical contradictions
@@ -199,6 +216,10 @@ After successfully updating the PRD, use the `AskUserQuestion` tool to present t
         "label": "Design UX",
         "description": "Create user experience designs and flows for this feature"
       },
+      {
+        "label": "Create tasks",
+        "description": "Generate implementation tasks for this feature"
+      },
       {
         "label": "Add another feature",
         "description": "Continue adding more features to the PRD"
@@ -213,6 +234,7 @@ After successfully updating the PRD, use the `AskUserQuestion` tool to present t
 
 - **Design architecture**: Invoke the `groundwork:architecture` skill to design the technical approach
 - **Design UX**: Invoke the `groundwork:ux-design` skill to create UX designs
+- **Create tasks**: Invoke the `groundwork:tasks` skill to generate implementation tasks for the new feature
 - **Add another feature**: Clear context by summarizing what was just added, then restart from Step 1 with a fresh feature request. Ask: "What feature would you like to add next?"
 
 ## Conversation Patterns
diff --git a/skills/task-validation-loop/SKILL.md b/skills/task-validation-loop/SKILL.md
index 2a5045a..07ccac1 100644
--- a/skills/task-validation-loop/SKILL.md
+++ b/skills/task-validation-loop/SKILL.md
@@ -11,7 +11,7 @@ Autonomous verification loop that runs 3 specialized agents to validate task lis
 ## Prerequisites
 
 Before invoking this skill, ensure:
-- Task list is complete (specs/tasks.md exists)
+- Task list is complete (specs/tasks/ directory or specs/tasks.md exists)
 - PRD exists (specs/product_specs.md)
 - Architecture exists (specs/architecture.md)
 - User has approved the task breakdown
@@ -93,7 +93,7 @@ Present results in table format:
    ```
 
 2. **Fix Each Finding** - Apply each critical/major recommendation
-   - Modify specs/tasks.md with required changes
+   - Modify the relevant task file in specs/tasks/ (or specs/tasks.md if single-file)
    - Track what was changed
    - Note which finding each fix addresses
 
diff --git a/skills/tasks/SKILL.md b/skills/tasks/SKILL.md
index 486e0c1..ba4fad9 100644
--- a/skills/tasks/SKILL.md
+++ b/skills/tasks/SKILL.md
@@ -15,7 +15,19 @@ Translates product specs and architecture into actionable implementation tasks.
   - `specs/product_specs.md` (PRD with EARS requirements)
   - `specs/architecture.md` (architecture with decision records)
   - `specs/design_system.md` (design system with DP/BRD/UXD decisions) - optional
-- **Output:** `specs/tasks.md` (task list with dependencies)
+- **Output:** `specs/tasks/` directory with per-milestone subdirectories:
+  ```
+  specs/tasks/
+  ├── _index.md          # Overview, milestone summary, dependency graph
+  ├── M1-core-auth/
+  │   ├── TASK-001.md
+  │   ├── TASK-002.md
+  │   └── TASK-003.md
+  ├── M2-upload/
+  │   ├── TASK-004.md
+  │   └── TASK-005.md
+  └── parking-lot.md     # Deferred tasks
+  ```
 
 ## Workflow Overview
 
@@ -55,9 +67,27 @@ If PRD or architecture is missing, prompt user:
 
 If design system exists, incorporate design context into UI/frontend tasks.
 
-## Step 2: Define Milestones
+## Step 2: Check Existing Tasks and Milestones
 
-Before generating tasks, establish **product milestones** - points where the application can be assessed by a user.
+Before defining milestones, check if tasks already exist:
+
+1. Check for `specs/tasks/_index.md` or `specs/tasks.md`
+2. If found, read to understand existing milestones, task numbering, and status
+
+**If existing tasks found:**
+- List existing milestones and their status (complete, in progress, not started)
+- Determine the highest task number (e.g., TASK-047) so new tasks continue the sequence
+- Determine the highest milestone number (e.g., M5) so new milestones continue the sequence
+- Present to user: "Found existing milestones M1-M5 with 47 tasks. New tasks will be added as TASK-048+."
+- Ask: "Should this new work be added to an existing milestone, or should I create a new milestone M6?"
+
+**If no existing tasks:** Proceed to define milestones from scratch.
+
+## Step 2b: Define Milestones
+
+**If adding to existing project:** Either add tasks to a user-selected existing milestone directory, or create a new milestone directory continuing the sequence (e.g., `M6-feature-name/`).
+
+**If starting fresh:** Establish **product milestones** - points where the application can be assessed by a user.
 
 Milestone principles:
 - **Vertically sliced** - Each milestone delivers user-visible value
@@ -69,16 +99,16 @@ Example milestone progression:
 ```
 M1: Core Authentication
     → User can sign up, log in, see empty dashboard
-    
-M2: Upload & Verification  
+
+M2: Upload & Verification
     → User can upload images, complete identity verification
-    
+
 M3: Model Training
     → User can initiate training, see progress, view completion
-    
+
 M4: Basic Generation
     → User can generate images with their model
-    
+
 M5: Billing Integration
     → User can subscribe, see quota, pay for overages
 ```
@@ -274,7 +304,29 @@ Present the complete task list organized by milestone. Ask:
 > - Do the dependencies look right?
 > - Should any tasks be split or combined?"
 
-Iterate until user approves, then write to `specs/tasks.md`.
+Iterate until user approves, then write task files to `specs/tasks/`.
+
+**Output structure:**
+1. Create or update `specs/tasks/_index.md` with overview, milestone summary table, dependency graph, **task status table**, and critical path
+2. For each new milestone, create a directory: `specs/tasks/M{N}-{slug}/` (e.g., `M1-core-auth/`)
+3. Write one file per task: `specs/tasks/M{N}-{slug}/TASK-{NNN}.md`
+4. Each task file is self-contained — includes all fields from the Required Task Format
+5. Write or update `specs/tasks/parking-lot.md` for deferred tasks (if any)
+
+**Task status table in `_index.md`** (required — used by `next-task` to find work without reading every task file):
+```markdown
+### Task Status
+
+| # | Task | Milestone | Status | Blocked by |
+|---|------|-----------|--------|------------|
+| TASK-001 | Auth setup | M1 | Not Started | None |
+| TASK-002 | Login UI | M1 | Not Started | TASK-001 |
+```
+
+**If adding to existing tasks:**
+- Update `_index.md` — add new rows to the status table, add the new milestone to the summary table, extend the dependency graph
+- Continue task numbering from the highest existing task number
+- Do NOT regenerate or modify existing milestone directories or task files
 
 **Progressive presentation:**
 - Present tasks one milestone at a time
diff --git a/skills/tasks/references/tasks-template.md b/skills/tasks/references/tasks-template.md
index 3c5a247..4653ee7 100644
--- a/skills/tasks/references/tasks-template.md
+++ b/skills/tasks/references/tasks-template.md
@@ -1,11 +1,28 @@
-# Tasks Document Template
+# Tasks Directory Template
 
-Use this template when creating `specs/tasks.md`.
+Use this structure when creating task files in `specs/tasks/`.
+
+## Directory Structure
+
+```
+specs/tasks/
+├── _index.md
+├── M1-{slug}/
+│   ├── TASK-001.md
+│   ├── TASK-002.md
+│   └── TASK-003.md
+├── M2-{slug}/
+│   ├── TASK-004.md
+│   └── TASK-005.md
+└── parking-lot.md
+```
+
+## `_index.md` Template
 
 ```markdown
 # Implementation Tasks
 
-**Generated from:** 
+**Generated from:**
 - PRD: `specs/product_specs.md`
 - Architecture: `specs/architecture.md`
 
@@ -22,15 +39,15 @@ Use this template when creating `specs/tasks.md`.
 
 ### Milestone Summary
 
-| Milestone | Description | Tasks | Status |
-|-----------|-------------|-------|--------|
-| M1 | [Name] | [N] | Not Started |
-| M2 | [Name] | [N] | Not Started |
-| M3 | [Name] | [N] | Not Started |
+| Milestone | Directory | Tasks | Status |
+|-----------|-----------|-------|--------|
+| M1 | `M1-[slug]/` | [N] | Not Started |
+| M2 | `M2-[slug]/` | [N] | Not Started |
+| M3 | `M3-[slug]/` | [N] | Not Started |
 
 ### Dependency Graph
 
-```
+` ``
 M1: [Name]
 ├── TASK-001 ──┐
 ├── TASK-002 ──┼── TASK-003
@@ -39,29 +56,36 @@ M1: [Name]
 M2: [Name] (depends: M1)
 ├── TASK-005 ──┐
 └── TASK-006 ──┴── TASK-007
-```
+` ``
+
+### Task Status
+
+| # | Task | Milestone | Status | Blocked by |
+|---|------|-----------|--------|------------|
+| TASK-001 | [Title] | M1 | Not Started | None |
+| TASK-002 | [Title] | M1 | Not Started | TASK-001 |
+| TASK-003 | [Title] | M1 | Not Started | TASK-001 |
+| TASK-004 | [Title] | M2 | Not Started | TASK-002, TASK-003 |
+| TASK-005 | [Title] | M2 | Not Started | TASK-004 |
 
 ### Critical Path
 
 TASK-001 → TASK-002 → TASK-003 → TASK-006 → TASK-007
 
 **Estimated critical path duration:** [X days/weeks]
+```
 
----
-
-## Milestone 1: [Name]
-
-**Goal:** [User-visible outcome]
-**Exit Criteria:** [How we know it's done]
-**Target Date:** [If known]
+## Individual Task File Template (`TASK-NNN.md`)
 
-### TASK-001: [Title]
+```markdown
+### TASK-NNN: [Task Title]
 
-**Component:** [From architecture]
+**Milestone:** M[X] - [Milestone Name]
+**Component:** [Component from architecture]
 **Estimate:** S | M | L | XL
 
-**Goal:** 
-[One sentence outcome]
+**Goal:**
+[One sentence describing the outcome]
 
 **Action Items:**
 - [ ] [Action 1]
@@ -69,53 +93,27 @@ TASK-001 → TASK-002 → TASK-003 → TASK-006 → TASK-007
 - [ ] [Action 3]
 
 **Dependencies:**
-- Blocked by: None
-- Blocks: TASK-002, TASK-003
+- Blocked by: [TASK-XXX, TASK-YYY or None]
+- Blocks: [TASK-ZZZ or None]
 
 **Acceptance Criteria:**
-- [Criterion 1]
-- [Criterion 2]
+- [Testable criterion 1]
+- [Testable criterion 2]
 
 **Related Requirements:** PRD-XXX-REQ-NNN
-**Related Decisions:** DR-NNN
+**Related Decisions:** DR-NNN, DP-NNN, BRD-NNN, UXD-NNN
 
 **Status:** Not Started | In Progress | Complete | Blocked
+```
 
----
-
-### TASK-002: [Title]
-
-[Same structure...]
-
----
-
-## Milestone 2: [Name]
-
-[Same structure...]
-
----
+## `parking-lot.md` Template
 
-## Parking Lot
+```markdown
+# Parking Lot
 
 Tasks identified but not yet scheduled:
 
 | ID | Description | Reason Deferred |
 |----|-------------|-----------------|
 | TASK-XXX | [description] | [why deferred] |
-
----
-
-## Glossary
-
-| Term | Definition |
-|------|------------|
-| [Term] | [Definition] |
-
----
-
-## Change Log
-
-| Date | Change | Author |
-|------|--------|--------|
-| [date] | Initial task breakdown | [name] |
 ```
diff --git a/skills/test-driven-development/SKILL.md b/skills/test-driven-development/SKILL.md
index e1f2c12..7347ecc 100644
--- a/skills/test-driven-development/SKILL.md
+++ b/skills/test-driven-development/SKILL.md
@@ -204,6 +204,27 @@ Next failing test for next feature.
 | **Clear** | Name describes behavior | `test('test1')` |
 | **Shows intent** | Demonstrates desired API | Obscures what code should do |
 
+## What NOT to Test
+
+Not everything benefits from TDD. Writing tests for these wastes budget and creates maintenance burden:
+
+**Skip tests for:**
+- Static markup and CSS classes (layout, colors, spacing, font sizes)
+- Element counts ("renders 6 nav items") — only fails on intentional changes
+- Pure presentational components with zero logic (just HTML + Tailwind)
+- Tooltip styling, container classes, overflow attributes
+- That an icon renders as an SVG
+
+**DO test:**
+- Conditional rendering (shows X when condition Y)
+- Computed values (formats currency, calculates percentage)
+- User interactions (click, keyboard, filter, sort, pagination)
+- State transitions (toggle, edit mode, selection)
+- Edge cases (empty data, zero values, overflow)
+- Business rules (color thresholds, frequency detection, amortization math)
+
+**Rule of thumb:** If the test would only fail when someone intentionally changes the UI, it's not testing behavior — it's preventing change. Delete it.
+
 ## Why Order Matters
 
 **"I'll write tests after to verify it works"**
@@ -329,7 +350,7 @@ Extract validation for multiple fields if needed.
 
 Before marking work complete:
 
-- [ ] Every new function/method has a test
+- [ ] Every function/method with logic (conditionals, computation, side effects) has a test
 - [ ] Watched each test fail before implementing
 - [ ] Each test failed for expected reason (feature missing, not typo)
 - [ ] Wrote minimal code to pass each test
@@ -351,8 +372,10 @@ Can't check all boxes? You skipped TDD. Start over.
 
 ## Coverage Requirements
 
-### Target: 80%+ Coverage
-- Unit tests + Integration tests + E2E tests combined
+### Target: 80%+ Coverage of Logic Paths
+- Coverage means branch/logic coverage, not line coverage of markup
+- Pure presentational components (no conditionals, no computed values) do NOT need unit tests — they are validated visually
+- Do not write tests solely to increase coverage numbers
 - All edge cases covered
 - Error scenarios tested
 - Boundary conditions verified
@@ -413,10 +436,105 @@ Never fix bugs without a test.
 
 ## Testing Anti-Patterns
 
-When adding mocks or test utilities, avoid these common pitfalls:
-- Testing mock behavior instead of real behavior
-- Adding test-only methods to production classes
-- Mocking without understanding dependencies
+Mocks are tools to isolate, not things to test. Before every mock, run these gates:
+
+### Don't test mock behavior
+
+```
+BEFORE asserting on any mock element:
+  Ask: "Am I testing real behavior or mock existence?"
+  IF mock existence → delete the assertion or unmock the component
+```
+
+```typescript
+// ❌ BAD: Testing that the mock exists
+test('renders sidebar', () => {
+  render(<Page />);
+  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
+});
+
+// ✅ GOOD: If sidebar must be mocked, assert on Page's behavior
+test('page passes user data to sidebar', () => {
+  render(<Page user={testUser} />);
+  expect(Sidebar).toHaveBeenCalledWith(
+    expect.objectContaining({ user: testUser }), expect.anything()
+  );
+});
+```
+
+### Don't add test-only methods to production classes
+
+```
+BEFORE adding any method to a production class:
+  Ask: "Is this only used by tests?"
+  IF yes → put it in test utilities instead
+```
+
+### Mock at the right level
+
+```
+BEFORE mocking any method:
+  1. "What side effects does the real method have?"
+  2. "Does this test depend on any of those side effects?"
+  IF yes → mock lower (the actual slow/external operation), not the method the test depends on
+```
+
+```typescript
+// ❌ BAD: Mock prevents config write that test depends on
+test('detects duplicate server', async () => {
+  vi.mock('ToolCatalog', () => ({
+    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
+  }));
+  await addServer(config);
+  await addServer(config);  // Should throw - but won't!
+});
+
+// ✅ GOOD: Mock the slow part, preserve behavior test needs
+test('detects duplicate server', async () => {
+  vi.mock('MCPServerManager'); // Just mock slow server startup
+  await addServer(config);    // Config written
+  await addServer(config);    // Duplicate detected ✓
+});
+```
+
+### Preserve real semantics in mocks
+
+```
+BEFORE creating any mock:
+  Ask: "What type does the real implementation return?"
+  Value vs. stream? Sync vs. async? Array vs. cursor?
+  IF mock changes the abstraction type → fix it
+```
+
+```typescript
+// ❌ BAD: Mock returns array, real implementation returns async cursor
+const mockDb = {
+  getUsers: vi.fn().mockResolvedValue([
+    { id: '1', name: 'Alice' },
+    { id: '2', name: 'Bob' },
+  ])
+};
+// Test passes, but production breaks on 1000+ users when
+// pagination and backpressure actually matter.
+
+// ✅ GOOD: Mock preserves real cursor semantics
+const mockDb = {
+  getUsers: vi.fn().mockReturnValue({
+    async *[Symbol.asyncIterator]() {
+      yield { id: '1', name: 'Alice' };
+      yield { id: '2', name: 'Bob' };
+    }
+  })
+};
+```
+
+### Red flags
+
+- Assertion checks for `*-mock` test IDs
+- Methods only called in test files
+- Mock setup is >50% of test code
+- `mockResolvedValue()` when real code returns a stream
+- Mocking "just to be safe"
 
 ## Final Rule
 
diff --git a/skills/test-driven-development/testing-anti-patterns.md b/skills/test-driven-development/testing-anti-patterns.md
deleted file mode 100644
index e77ab6b..0000000
--- a/skills/test-driven-development/testing-anti-patterns.md
+++ /dev/null
@@ -1,299 +0,0 @@
-# Testing Anti-Patterns
-
-**Load this reference when:** writing or changing tests, adding mocks, or tempted to add test-only methods to production code.
-
-## Overview
-
-Tests must verify real behavior, not mock behavior. Mocks are a means to isolate, not the thing being tested.
-
-**Core principle:** Test what the code does, not what the mocks do.
-
-**Following strict TDD prevents these anti-patterns.**
-
-## The Iron Laws
-
-```
-1. NEVER test mock behavior
-2. NEVER add test-only methods to production classes
-3. NEVER mock without understanding dependencies
-```
-
-## Anti-Pattern 1: Testing Mock Behavior
-
-**The violation:**
-```typescript
-// ❌ BAD: Testing that the mock exists
-test('renders sidebar', () => {
-  render(<Page />);
-  expect(screen.getByTestId('sidebar-mock')).toBeInTheDocument();
-});
-```
-
-**Why this is wrong:**
-- You're verifying the mock works, not that the component works
-- Test passes when mock is present, fails when it's not
-- Tells you nothing about real behavior
-
-**your human partner's correction:** "Are we testing the behavior of a mock?"
-
-**The fix:**
-```typescript
-// ✅ GOOD: Test real component or don't mock it
-test('renders sidebar', () => {
-  render(<Page />);  // Don't mock sidebar
-  expect(screen.getByRole('navigation')).toBeInTheDocument();
-});
-
-// OR if sidebar must be mocked for isolation:
-// Don't assert on the mock - test Page's behavior with sidebar present
-```
-
-### Gate Function
-
-```
-BEFORE asserting on any mock element:
-  Ask: "Am I testing real component behavior or just mock existence?"
-
-  IF testing mock existence:
-    STOP - Delete the assertion or unmock the component
-
-  Test real behavior instead
-```
-
-## Anti-Pattern 2: Test-Only Methods in Production
-
-**The violation:**
-```typescript
-// ❌ BAD: destroy() only used in tests
-class Session {
-  async destroy() {  // Looks like production API!
-    await this._workspaceManager?.destroyWorkspace(this.id);
-    // ... cleanup
-  }
-}
-
-// In tests
-afterEach(() => session.destroy());
-```
-
-**Why this is wrong:**
-- Production class polluted with test-only code
-- Dangerous if accidentally called in production
-- Violates YAGNI and separation of concerns
-- Confuses object lifecycle with entity lifecycle
-
-**The fix:**
-```typescript
-// ✅ GOOD: Test utilities handle test cleanup
-// Session has no destroy() - it's stateless in production
-
-// In test-utils/
-export async function cleanupSession(session: Session) {
-  const workspace = session.getWorkspaceInfo();
-  if (workspace) {
-    await workspaceManager.destroyWorkspace(workspace.id);
-  }
-}
-
-// In tests
-afterEach(() => cleanupSession(session));
-```
-
-### Gate Function
-
-```
-BEFORE adding any method to production class:
-  Ask: "Is this only used by tests?"
-
-  IF yes:
-    STOP - Don't add it
-    Put it in test utilities instead
-
-  Ask: "Does this class own this resource's lifecycle?"
-
-  IF no:
-    STOP - Wrong class for this method
-```
-
-## Anti-Pattern 3: Mocking Without Understanding
-
-**The violation:**
-```typescript
-// ❌ BAD: Mock breaks test logic
-test('detects duplicate server', () => {
-  // Mock prevents config write that test depends on!
-  vi.mock('ToolCatalog', () => ({
-    discoverAndCacheTools: vi.fn().mockResolvedValue(undefined)
-  }));
-
-  await addServer(config);
-  await addServer(config);  // Should throw - but won't!
-});
-```
-
-**Why this is wrong:**
-- Mocked method had side effect test depended on (writing config)
-- Over-mocking to "be safe" breaks actual behavior
-- Test passes for wrong reason or fails mysteriously
-
-**The fix:**
-```typescript
-// ✅ GOOD: Mock at correct level
-test('detects duplicate server', () => {
-  // Mock the slow part, preserve behavior test needs
-  vi.mock('MCPServerManager'); // Just mock slow server startup
-
-  await addServer(config);  // Config written
-  await addServer(config);  // Duplicate detected ✓
-});
-```
-
-### Gate Function
-
-```
-BEFORE mocking any method:
-  STOP - Don't mock yet
-
-  1. Ask: "What side effects does the real method have?"
-  2. Ask: "Does this test depend on any of those side effects?"
-  3. Ask: "Do I fully understand what this test needs?"
-
-  IF depends on side effects:
-    Mock at lower level (the actual slow/external operation)
-    OR use test doubles that preserve necessary behavior
-    NOT the high-level method the test depends on
-
-  IF unsure what test depends on:
-    Run test with real implementation FIRST
-    Observe what actually needs to happen
-    THEN add minimal mocking at the right level
-
-  Red flags:
-    - "I'll mock this to be safe"
-    - "This might be slow, better mock it"
-    - Mocking without understanding the dependency chain
-```
-
-## Anti-Pattern 4: Incomplete Mocks
-
-**The violation:**
-```typescript
-// ❌ BAD: Partial mock - only fields you think you need
-const mockResponse = {
-  status: 'success',
-  data: { userId: '123', name: 'Alice' }
-  // Missing: metadata that downstream code uses
-};
-
-// Later: breaks when code accesses response.metadata.requestId
-```
-
-**Why this is wrong:**
-- **Partial mocks hide structural assumptions** - You only mocked fields you know about
-- **Downstream code may depend on fields you didn't include** - Silent failures
-- **Tests pass but integration fails** - Mock incomplete, real API complete
-- **False confidence** - Test proves nothing about real behavior
-
-**The Iron Rule:** Mock the COMPLETE data structure as it exists in reality, not just fields your immediate test uses.
-
-**The fix:**
-```typescript
-// ✅ GOOD: Mirror real API completeness
-const mockResponse = {
-  status: 'success',
-  data: { userId: '123', name: 'Alice' },
-  metadata: { requestId: 'req-789', timestamp: 1234567890 }
-  // All fields real API returns
-};
-```
-
-### Gate Function
-
-```
-BEFORE creating mock responses:
-  Check: "What fields does the real API response contain?"
-
-  Actions:
-    1. Examine actual API response from docs/examples
-    2. Include ALL fields system might consume downstream
-    3. Verify mock matches real response schema completely
-
-  Critical:
-    If you're creating a mock, you must understand the ENTIRE structure
-    Partial mocks fail silently when code depends on omitted fields
-
-  If uncertain: Include all documented fields
-```
-
-## Anti-Pattern 5: Integration Tests as Afterthought
-
-**The violation:**
-```
-✅ Implementation complete
-❌ No tests written
-"Ready for testing"
-```
-
-**Why this is wrong:**
-- Testing is part of implementation, not optional follow-up
-- TDD would have caught this
-- Can't claim complete without tests
-
-**The fix:**
-```
-TDD cycle:
-1. Write failing test
-2. Implement to pass
-3. Refactor
-4. THEN claim complete
-```
-
-## When Mocks Become Too Complex
-
-**Warning signs:**
-- Mock setup longer than test logic
-- Mocking everything to make test pass
-- Mocks missing methods real components have
-- Test breaks when mock changes
-
-**your human partner's question:** "Do we need to be using a mock here?"
-
-**Consider:** Integration tests with real components often simpler than complex mocks
-
-## TDD Prevents These Anti-Patterns
-
-**Why TDD helps:**
-1. **Write test first** → Forces you to think about what you're actually testing
-2. **Watch it fail** → Confirms test tests real behavior, not mocks
-3. **Minimal implementation** → No test-only methods creep in
-4. **Real dependencies** → You see what the test actually needs before mocking
-
-**If you're testing mock behavior, you violated TDD** - you added mocks without watching test fail against real code first.
-
-## Quick Reference
-
-| Anti-Pattern | Fix |
-|--------------|-----|
-| Assert on mock elements | Test real component or unmock it |
-| Test-only methods in production | Move to test utilities |
-| Mock without understanding | Understand dependencies first, mock minimally |
-| Incomplete mocks | Mirror real API completely |
-| Tests as afterthought | TDD - tests first |
-| Over-complex mocks | Consider integration tests |
-
-## Red Flags
-
-- Assertion checks for `*-mock` test IDs
-- Methods only called in test files
-- Mock setup is >50% of test
-- Test fails when you remove mock
-- Can't explain why mock is needed
-- Mocking "just to be safe"
-
-## The Bottom Line
-
-**Mocks are tools to isolate, not things to test.**
-
-If TDD reveals you're testing mock behavior, you've gone wrong.
-
-Fix: Test real behavior or question why you're mocking at all.
diff --git a/skills/validation-loop/SKILL.md b/skills/validation-loop/SKILL.md
index b2ad0cb..35dcc55 100644
--- a/skills/validation-loop/SKILL.md
+++ b/skills/validation-loop/SKILL.md
@@ -109,6 +109,9 @@ Each returns JSON:
    ```
 
 2. **Fix Each Finding** - Apply each non-minor recommendation
+   - **Behavioral fixes** (new logic, async changes, control flow, debouncing): write or update a failing test first, then apply the fix, then verify green — follow TDD
+   - **Cosmetic fixes** (class names, constants, formatting): apply directly, verify tests still pass
+   - **How to tell:** if the fix requires new/changed test assertions, it's behavioral
    - Track what was changed
    - Note which finding each fix addresses