diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index aa17469..6e4ea5a 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,4 +1,4 @@
-exclude: "/migrations/|Makefile*"
+exclude: "^docs/agents/instructions/|/migrations/|Makefile*"
default_stages: [ pre-commit ]
repos:
diff --git a/docs/agents/agent_instructions.md b/docs/agents/agent_instructions.md
index bbdfd91..d77264e 100644
--- a/docs/agents/agent_instructions.md
+++ b/docs/agents/agent_instructions.md
@@ -39,20 +39,20 @@ Your mission is twofold:
To assist with specific domains, specialized instruction files are available in `docs/agents/instructions`.
**Mandate:** You MUST read and apply the relevant project-specific context file when working within these domains. These files outline architectural constraints, preferred tools, and forbidden patterns for this specific repository.
-| Domain | Project-Specific Context File |
-| :---------------------- | :------------------------------------------------ |
-| **Orchestrator** | `docs/agents/instructions/orchestrator.md` |
-| **Planning** | `docs/agents/instructions/formal_planning.md` |
-| **Plan to tasks** | `docs/agents/instructions/plan_to_tasks.md` |
-| **ML / Geospatial** | `docs/agents/instructions/ml.md` |
-| **Python / QA** | `docs/agents/instructions/python.md` |
-| **System Design** | `docs/agents/instructions/systemdesign.md` |
-| **Infrastructure** | `docs/agents/instructions/infrastructure.md` |
-| **Root Cause Analysis** | `docs/agents/instructions/root_cause_analysis.md` |
-| **Analytics** | `docs/agents/instructions/analytics.md` |
-| **Security** | `docs/agents/instructions/security.md` |
-| **Spec-Driven Dev** | `docs/agents/instructions/specdrivendev.md` |
-| **Knowledge Base** | `docs/agents/instructions/KNOWLEDGE.md` |
+| Domain | Project-Specific Context File |
+| :---------------------- | :----------------------------------------------------------------------------------------- |
+| **Orchestrator** | [**docs/agents/instructions/orchestrator.md**](instructions/orchestrator.md) |
+| **Planning** | [**docs/agents/instructions/formal_planning.md**](instructions/formal_planning.md) |
+| **Plan to tasks** | [**docs/agents/instructions/plan_to_tasks.md**](instructions/plan_to_tasks.md) |
+| **ML / Geospatial** | [**docs/agents/instructions/ml.md**](instructions/ml.md) |
+| **Python / QA** | [**docs/agents/instructions/python.md**](instructions/python.md) |
+| **System Design** | [**docs/agents/instructions/systemdesign.md**](instructions/systemdesign.md) |
+| **Infrastructure** | [**docs/agents/instructions/infrastructure.md**](instructions/infrastructure.md) |
+| **Root Cause Analysis** | [**docs/agents/instructions/root_cause_analysis.md**](instructions/root_cause_analysis.md) |
+| **Analytics** | [**docs/agents/instructions/analytics.md**](instructions/analytics.md) |
+| **Security** | [**docs/agents/instructions/security.md**](instructions/security.md) |
+| **Spec-Driven Dev** | [**docs/agents/instructions/specdrivendev.md**](instructions/specdrivendev.md) |
+| **Knowledge Base** | [**docs/agents/instructions/KNOWLEDGE.md**](instructions/KNOWLEDGE.md) |
## 4. Agent Behaviors, Memory & Tactics
diff --git a/docs/agents/instructions/KNOWLEDGE.md b/docs/agents/instructions/KNOWLEDGE.md
index 01bd40b..7715f65 100644
--- a/docs/agents/instructions/KNOWLEDGE.md
+++ b/docs/agents/instructions/KNOWLEDGE.md
@@ -1,3 +1,7 @@
+---
+name: KNOWLEDGE
+description: Project Knowledge Base
+---
# Project Knowledge Base
## STAC Catalogs
diff --git a/docs/agents/instructions/analytics.md b/docs/agents/instructions/analytics.md
index 839f2c5..9e97cc4 100644
--- a/docs/agents/instructions/analytics.md
+++ b/docs/agents/instructions/analytics.md
@@ -1,15 +1,17 @@
+---
+name: analytics
+description: Analytics Skill Instructions
+---
# Analytics Skill Instructions
-\
+## Primary Directive
Your primary objective is to extract truth from experimental data without fooling yourself or others.
**MANDATE:** Apply the project-specific rules outlined below for all analytics and EDA tasks.
-\
-
+## Context
In a geospatial research setting, data analysis must account for spatial dimensions, non-standard projections, and highly skewed physical measurements (e.g., radar reflectivity, atmospheric depth).
-
-
+## Standards
You MUST adhere to the following project-specific standards when performing or reviewing data analysis:
### 1. Geospatial Exploratory Data Analysis (EDA)
@@ -33,11 +35,11 @@ You MUST adhere to the following project-specific standards when performing or r
- **Assumptions:** ALWAYS verify statistical assumptions (e.g., Normality) before applying tests (e.g., T-Test).
- **Reporting:** Report effect sizes (e.g., Cohen's d) alongside p-values. Statistical significance != Practical significance.
-
+
-\
+## Forbidden Patterns
- ❌ **Ignoring Nodata:** You MUST NOT silently calculate statistics over arrays containing raw nodata values (e.g., averaging `-9999` with valid data). Use `xarray.where()` or masked arrays.
- ❌ **"Magic" Outlier Removal:** You MUST NOT remove spatial data points just because they "look wrong" without explicit domain-specific justification.
- ❌ **Pie Charts & Dual Y-Axes:** Avoid these misleading visualization formats entirely.
- \
+
diff --git a/docs/agents/instructions/formal_planning.md b/docs/agents/instructions/formal_planning.md
index 5fe5b29..3187b0a 100644
--- a/docs/agents/instructions/formal_planning.md
+++ b/docs/agents/instructions/formal_planning.md
@@ -1,8 +1,11 @@
+---
+name: formal_planning
+description: Formal Planning Protocol
+---
# Formal Planning Protocol
-\
+## Primary Directive
**MANDATE:** Ensure any generated plan adheres to the structure below.
-\
When the user explicitly asks for a "plan," "architecture," "design," or "proposal"—or when embarking on a multi-step/multi-domain implementation—you must use the **Formal Design Document** structure below, saving it to `docs/agents/planning/_PLAN.md`.
diff --git a/docs/agents/instructions/infrastructure.md b/docs/agents/instructions/infrastructure.md
index 3c31a56..5258124 100644
--- a/docs/agents/instructions/infrastructure.md
+++ b/docs/agents/instructions/infrastructure.md
@@ -1,15 +1,17 @@
+---
+name: infrastructure
+description: Infrastructure Skill Instructions
+---
# Infrastructure Skill Instructions
-\
+## Primary Directive
Your objective is to ensure that all research environments, compute jobs, and pipelines are reproducible, resilient, and explicitly defined as code.
**MANDATE:** Apply the project-specific rules outlined below for all infrastructure and environment tasks.
-\
-
+## Context
This project uses modern Python packaging and infrastructure-as-code principles.
-
-
+## Standards
You MUST enforce the following project-specific infrastructure standards:
### 1. Environment Management
@@ -30,11 +32,11 @@ You MUST enforce the following project-specific infrastructure standards:
### 4. HPC & Automation
- **Explicit Resources:** If interacting with SLURM or cluster job scripts, ALWAYS request specific resources (`cpus-per-task`, memory, etc.).
-
+
-\
+## Forbidden Patterns
- ❌ **"ClickOps":** You MUST NOT recommend setting up environments or servers manually via a GUI.
- ❌ **Untracked Environments:** Do not add dependencies without ensuring they are reflected in `pyproject.toml` and `uv.lock`.
- ❌ **Hardcoded Secrets:** You MUST NEVER include API keys or tokens in scripts, Makefiles, or Dockerfiles.
- \
+
diff --git a/docs/agents/instructions/ml.md b/docs/agents/instructions/ml.md
index 41b7b0f..c387703 100644
--- a/docs/agents/instructions/ml.md
+++ b/docs/agents/instructions/ml.md
@@ -1,15 +1,17 @@
+---
+name: ml
+description: Machine Learning & Geospatial Processing Instructions
+---
# Machine Learning & Geospatial Processing Instructions
-\
+## Primary Directive
Your goal is to help build state-of-the-art models and data pipelines that are reproducible, reliable, and well-documented.
**MANDATE:** Apply the project-specific rules outlined below for all ML and geospatial processing tasks.
-\
-
+## Context
This project deals heavily with geospatial datasets (Sentinel-2, Radar, etc.) which introduce unique memory and projection challenges compared to standard ML pipelines.
-
-
+## Standards
You MUST enforce the following project-specific standards:
### 1. Geospatial Data Handling
@@ -27,11 +29,11 @@ You MUST enforce the following project-specific standards:
- **Config-Driven:** Hyperparameters and dataset paths MUST be externalized to configuration files and loaded via Pydantic models.
-
+
-\
+## Forbidden Patterns
- ❌ **Silent OOMs:** You MUST NOT write data loaders that attempt to load massive raster datasets entirely into RAM.
- ❌ **Ignoring CRS:** You MUST NEVER perform spatial joins or distance calculations without first asserting both datasets share the exact same CRS.
- ❌ **Fitting on Test Data:** You MUST NEVER allow data transformations to be fitted on the validation or test sets.
- \
+
diff --git a/docs/agents/instructions/orchestrator.md b/docs/agents/instructions/orchestrator.md
index fb67d12..61a67ca 100644
--- a/docs/agents/instructions/orchestrator.md
+++ b/docs/agents/instructions/orchestrator.md
@@ -1,15 +1,17 @@
+---
+name: orchestrator
+description: Orchestrator Skill Instructions
+---
# Orchestrator Skill Instructions
-\
+## Primary Directive
Your primary responsibility is the horizontal integration of all research components.
**MANDATE:** Apply the project-specific rules outlined below for all orchestration and integration tasks.
-\
-
+## Context
In this repository, successful orchestration means tying together raw geospatial data fetching (STAC/Copernicus), pre-processing (Rasterio/Xarray), and output generation (COGs/Zarr).
-
-
+## Workflow
For any task requiring more than a minor fix, you MUST enforce the following framework:
### 1. The Written Plan (Mandatory)
@@ -25,11 +27,11 @@ Before writing implementation code, you MUST create or update a `
+
-\
+## Forbidden Patterns
- ❌ **Vertical Myopia:** You MUST NOT focus entirely on optimizing one specific file while ignoring how it breaks integration with the rest of the project (e.g., changing a config structure without updating `geospatial_tools_ini.yaml.example`).
- ❌ **Implied Contracts:** You MUST NOT build components that pass raw, untyped dictionaries to each other. Always enforce explicit data contracts (e.g., Pydantic Models, Dataclasses).
- ❌ **Skipping E2E Testing:** You MUST NOT declare a complex integration "complete" without verifying that the data flows from start to finish via `nox` testing sessions or test notebooks.
- \
+
diff --git a/docs/agents/instructions/plan_to_tasks.md b/docs/agents/instructions/plan_to_tasks.md
index 78b73c2..7264ac8 100644
--- a/docs/agents/instructions/plan_to_tasks.md
+++ b/docs/agents/instructions/plan_to_tasks.md
@@ -1,8 +1,11 @@
+---
+name: plan_to_tasks
+description: Plan To Tasks
+---
# Plan To Tasks
-\
+## Primary Directive
**MANDATE:** Decompose a plan into modular, atomic tasks, each documented in its own file (or a structured document) with all the context needed for implementation and verification.
-\
## Core Workflow
diff --git a/docs/agents/instructions/python.md b/docs/agents/instructions/python.md
index 48206d2..c48de72 100644
--- a/docs/agents/instructions/python.md
+++ b/docs/agents/instructions/python.md
@@ -1,19 +1,21 @@
+---
+name: python
+description: Python & QA Skill Instructions
+---
# Python & QA Skill Instructions
-\
+## Primary Directive
Your objective is to elevate research scripts into robust, maintainable, and type-safe software.
**MANDATE:** Apply the project-specific rules outlined below for all Python development and QA tasks.
-\
-
+## Context
This project relies heavily on modern Python tooling and strictly enforced quality assurance.
- **Pre-commit is Central:** All QA tasks (linting, formatting, type checking) are orchestrated via `pre-commit`.
- **Environment & Build:** We use `uv` for package management and `hatchling` as the build backend (defined in `pyproject.toml`).
- **Task Automation:** We use `nox` for isolated test environments and task execution.
- **Makefile:** We use a makefile to automate and orchestrate most things in this project. Use `make targets` to discover the available targets.
-
-
+## Standards
You MUST strictly adhere to the following project-specific Python standards:
### 1. QA & Tooling
@@ -27,19 +29,20 @@ You MUST strictly adhere to the following project-specific Python standards:
- **Strict Typing:** You MUST use type hints for ALL function arguments and return values (e.g., `def process(data: str | Any) -> pd.DataFrame`).
- **Filesystem Paths:** You MUST NEVER use `os.path`. ALWAYS use `pathlib.Path` for all file and directory manipulations.
- **Logging:** Use `structlog` for application flow. NEVER use `print()` for production code.
+- **Function Calls:** Prefer keyword arguments for complex function calls to enhance readability and maintainability.
- **Data Structures:** ALWAYS use `@dataclass` or `pydantic` models for complex structures instead of untyped dictionaries.
- **Type Hints Format:** Always prefer X | Y format over Union[X, Y].
-- **Docstrings:** Always add docstrings to your functions and classes. Use the Google standard for docstrings. Don't show types in docstrings.
+- **Docstrings:** Always add docstrings to your functions and classes. Use the Google standard for docstrings and follow the Diátaxis framework for documentation structure. Don't show types in docstrings.
### 3. Testing & Performance
- **Vectorization:** ALWAYS prefer vectorized operations (NumPy, Pandas, Polars, Xarray) over native Python `for` loops when processing geospatial data.
-
+
-\
+## Forbidden Patterns
- ❌ **Bypassing Pre-commit:** Do not commit code that fails `pre-commit` checks. Fix the underlying linting or typing issue.
- ❌ **Global Mutable State:** You MUST NEVER define or mutate global variables to pass state between functions.
- ❌ **Magic Numbers/Strings:** You MUST NOT hardcode numeric constants. Extract them to Pydantic settings or config classes.
- ❌ **Bare Except Blocks:** You MUST NEVER use `except: pass` or `except Exception: pass`.
- \
+
diff --git a/docs/agents/instructions/root_cause_analysis.md b/docs/agents/instructions/root_cause_analysis.md
index 008c3bd..23f37ab 100644
--- a/docs/agents/instructions/root_cause_analysis.md
+++ b/docs/agents/instructions/root_cause_analysis.md
@@ -1,15 +1,17 @@
+---
+name: root_cause_analysis
+description: Root Cause Analysis (RCA) Skill Instructions
+---
# Root Cause Analysis (RCA) Skill Instructions
-\
+## Primary Directive
Your objective is to systematically diagnose and permanently fix software failures.
**MANDATE:** Apply the project-specific rules outlined below for all debugging and root cause analysis tasks.
-\
-
+## Context
Geospatial errors are often opaque (e.g., `rasterio.errors.RasterioIOError`, mismatched CRSs, out-of-bounds bounding boxes). Slapping a `try/except` block over an error without understanding it creates brittle systems that fail silently later.
-
-
+## Workflow
When presented with a traceback or unexpected result, you MUST follow this workflow:
### Step 1: Evidence Gathering
@@ -29,11 +31,11 @@ When presented with a traceback or unexpected result, you MUST follow this workf
- Propose the smallest, most targeted code change required. Prove the fix works via `pytest`.
- Document the finding in `docs/agents/instructions/KNOWLEDGE.md` if it represents a systemic quirk (e.g., a specific STAC catalog behavior).
-
+
-\
+## Forbidden Patterns
- ❌ **Guesswork:** You MUST NOT propose random fixes (e.g., "try reprojecting it again") without a coherent hypothesis based on the traceback and data state.
- ❌ **Patching Symptoms:** You MUST NEVER suppress an error (e.g., using a bare `except: pass`) without fixing the foundational logic flaw that caused it.
- ❌ **Fixing Without Explaining:** You MUST NOT provide a corrected block of code without first explaining the root cause of the bug to the researcher.
- \
+
diff --git a/docs/agents/instructions/security.md b/docs/agents/instructions/security.md
index ccfea8f..75c73b7 100644
--- a/docs/agents/instructions/security.md
+++ b/docs/agents/instructions/security.md
@@ -1,15 +1,17 @@
+---
+name: security
+description: Security Skill Instructions
+---
# Security Skill Instructions
-\
+## Primary Directive
Your primary objective is to identify vulnerabilities, enforce defense-in-depth, and ensure absolute data privacy.
**MANDATE:** Apply the project-specific rules outlined below for all tasks involving security, authentication, or data privacy.
-\
-
+## Context
Geospatial research often involves massive downloads from third-party catalogs (STAC, Copernicus) requiring authentication tokens. Exposing these tokens compromises the lab's infrastructure limits.
-
-
+## Standards
You MUST actively enforce the following project-specific security standards:
### 1. Secret Management
@@ -21,10 +23,11 @@ You MUST actively enforce the following project-specific security standards:
- **Path Traversal:** When dynamically generating file paths based on STAC item IDs or user input, use `pathlib.Path.resolve()` to ensure paths do not traverse outside the intended output directory (`../`).
- **Deserialization:** Do not use `pickle` or `numpy.load(allow_pickle=True)` for data acquired from external STAC catalogs.
-
+
-\
+## Forbidden Patterns
- ❌ **Committing Secrets:** You MUST NEVER allow code containing hardcoded credentials or API tokens to be committed.
- ❌ **Disabling SSL Verification:** You MUST NEVER permit `verify=False` in `requests` or `aiohttp` calls to STAC catalogs or data endpoints.
- \
+- ❌ **Raw SQL:** Always use ORMs or parameterized queries to prevent SQL injection. You MUST NEVER construct raw SQL strings with user input.
+
diff --git a/docs/agents/instructions/specdrivendev.md b/docs/agents/instructions/specdrivendev.md
index 797dc02..a78d655 100644
--- a/docs/agents/instructions/specdrivendev.md
+++ b/docs/agents/instructions/specdrivendev.md
@@ -1,16 +1,18 @@
+---
+name: specdrivendev
+description: Skill: Lightweight Spec-Driven Development (SDD)
+---
# Skill: Lightweight Spec-Driven Development (SDD)
-\
+## Primary Directive
You are an **Educational Architect** teaching a researcher how to use a lightweight version of Spec-Driven Development (SDD).
**MANDATE:** Apply the project-specific rules outlined below for defining new features or interfaces.
-\
-
+## Context
In geospatial research, jumping straight into implementation often leads to messy code, unclear boundaries (e.g., passing untyped numpy arrays without CRS metadata), and debugging nightmares.
By defining the "Specification" or "Contract" first we force the researcher to think precisely about inputs, spatial bounds, shapes, and edge cases.
-
-
+## Workflow
When starting a new feature, you MUST guide the researcher through these steps:
### Step 1: Define the Nouns (Dataclasses)
@@ -27,10 +29,10 @@ When starting a new feature, you MUST guide the researcher through these steps:
- Use `raise NotImplementedError()` for the function body.
- **STOP.** Present the stub to the researcher and ask for validation BEFORE generating the logic.
-
+
-\
+## Forbidden Patterns
- ❌ **The `Any` Escape Hatch:** You MUST NOT use `Any` in type hints unless absolutely unavoidable. Use `xarray.Dataset` or `geopandas.GeoDataFrame` specifically.
- ❌ **Logic Before Contract:** You MUST NOT write the function logic before the signature and docstring are established.
- \
+
diff --git a/docs/agents/instructions/systemdesign.md b/docs/agents/instructions/systemdesign.md
index 2dca247..9f6df5a 100644
--- a/docs/agents/instructions/systemdesign.md
+++ b/docs/agents/instructions/systemdesign.md
@@ -1,15 +1,17 @@
+---
+name: systemdesign
+description: System Design & Architecture Skill Instructions
+---
# System Design & Architecture Skill Instructions
-\
+## Primary Directive
Your objective is to design systems that are maintainable, evolvable, and robust.
**MANDATE:** Apply the project-specific rules outlined below for all system design and architectural tasks.
-\
-
+## Context
Geospatial research codebases quickly become tangled if data fetching (STAC), processing (Rasterio), and analysis (Xarray) are all handled in the same script.
-
-
+## Standards
You MUST enforce the following project-specific architectural patterns:
### 1. Configuration-First Design
@@ -24,10 +26,10 @@ You MUST enforce the following project-specific architectural patterns:
### 3. Error Handling & Idempotency
- Design pipelines to resume gracefully. If a 100-tile download fails at tile 99, the pipeline must be able to restart and only fetch the missing tile.
-
+
-\
+## Forbidden Patterns
- ❌ **God Objects:** You MUST NOT design classes that handle STAC querying, raster clipping, and matplotlib plotting simultaneously.
- ❌ **Hardcoded Configurations:** You MUST NEVER bury STAC endpoints, chunk sizes, or file paths inside logic files.
- \
+