Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
exclude: "/migrations/|Makefile*"
exclude: "^docs/agents/instructions/|/migrations/|Makefile*"
default_stages: [ pre-commit ]

repos:
Expand Down
28 changes: 14 additions & 14 deletions docs/agents/agent_instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,20 @@ Your mission is twofold:
To assist with specific domains, specialized instruction files are available in `docs/agents/instructions`.
**Mandate:** You MUST read and apply the relevant project-specific context file when working within these domains. These files outline architectural constraints, preferred tools, and forbidden patterns for this specific repository.

| Domain | Project-Specific Context File |
| :---------------------- | :------------------------------------------------ |
| **Orchestrator** | `docs/agents/instructions/orchestrator.md` |
| **Planning** | `docs/agents/instructions/formal_planning.md` |
| **Plan to tasks** | `docs/agents/instructions/plan_to_tasks.md` |
| **ML / Geospatial** | `docs/agents/instructions/ml.md` |
| **Python / QA** | `docs/agents/instructions/python.md` |
| **System Design** | `docs/agents/instructions/systemdesign.md` |
| **Infrastructure** | `docs/agents/instructions/infrastructure.md` |
| **Root Cause Analysis** | `docs/agents/instructions/root_cause_analysis.md` |
| **Analytics** | `docs/agents/instructions/analytics.md` |
| **Security** | `docs/agents/instructions/security.md` |
| **Spec-Driven Dev** | `docs/agents/instructions/specdrivendev.md` |
| **Knowledge Base** | `docs/agents/instructions/KNOWLEDGE.md` |
| Domain | Project-Specific Context File |
| :---------------------- | :----------------------------------------------------------------------------------------- |
| **Orchestrator** | [**docs/agents/instructions/orchestrator.md**](instructions/orchestrator.md) |
| **Planning** | [**docs/agents/instructions/formal_planning.md**](instructions/formal_planning.md) |
| **Plan to tasks** | [**docs/agents/instructions/plan_to_tasks.md**](instructions/plan_to_tasks.md) |
| **ML / Geospatial** | [**docs/agents/instructions/ml.md**](instructions/ml.md) |
| **Python / QA** | [**docs/agents/instructions/python.md**](instructions/python.md) |
| **System Design** | [**docs/agents/instructions/systemdesign.md**](instructions/systemdesign.md) |
| **Infrastructure** | [**docs/agents/instructions/infrastructure.md**](instructions/infrastructure.md) |
| **Root Cause Analysis** | [**docs/agents/instructions/root_cause_analysis.md**](instructions/root_cause_analysis.md) |
| **Analytics** | [**docs/agents/instructions/analytics.md**](instructions/analytics.md) |
| **Security** | [**docs/agents/instructions/security.md**](instructions/security.md) |
| **Spec-Driven Dev** | [**docs/agents/instructions/specdrivendev.md**](instructions/specdrivendev.md) |
| **Knowledge Base** | [**docs/agents/instructions/KNOWLEDGE.md**](instructions/KNOWLEDGE.md) |

## 4. Agent Behaviors, Memory & Tactics

Expand Down
4 changes: 4 additions & 0 deletions docs/agents/instructions/KNOWLEDGE.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
name: KNOWLEDGE
description: Project Knowledge Base
---
# Project Knowledge Base

## STAC Catalogs
Expand Down
18 changes: 10 additions & 8 deletions docs/agents/instructions/analytics.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: analytics
description: Analytics Skill Instructions
---
# Analytics Skill Instructions

\<primary_directive>
## Primary Directive
Your primary objective is to extract truth from experimental data without fooling yourself or others.
**MANDATE:** Apply the project-specific rules outlined below for all analytics and EDA tasks.
\</primary_directive>

<context>
## Context
In a geospatial research setting, data analysis must account for spatial dimensions, non-standard projections, and highly skewed physical measurements (e.g., radar reflectivity, atmospheric depth).
</context>

<standards>
## Standards
You MUST adhere to the following project-specific standards when performing or reviewing data analysis:

### 1. Geospatial Exploratory Data Analysis (EDA)
Expand All @@ -33,11 +35,11 @@ You MUST adhere to the following project-specific standards when performing or r

- **Assumptions:** ALWAYS verify statistical assumptions (e.g., Normality) before applying tests (e.g., T-Test).
- **Reporting:** Report effect sizes (e.g., Cohen's d) alongside p-values. Statistical significance != Practical significance.
</standards>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Ignoring Nodata:** You MUST NOT silently calculate statistics over arrays containing raw nodata values (e.g., averaging `-9999` with valid data). Use `xarray.where()` or masked arrays.
- ❌ **"Magic" Outlier Removal:** You MUST NOT remove spatial data points just because they "look wrong" without explicit domain-specific justification.
- ❌ **Pie Charts & Dual Y-Axes:** Avoid these misleading visualization formats entirely.
\</forbidden_patterns>

7 changes: 5 additions & 2 deletions docs/agents/instructions/formal_planning.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
---
name: formal_planning
description: Formal Planning Protocol
---
# Formal Planning Protocol

\<primary_directive>
## Primary Directive
**MANDATE:** Ensure any generated plan adheres to the structure below.
\</primary_directive>

When the user explicitly asks for a "plan," "architecture," "design," or "proposal"—or when embarking on a multi-step/multi-domain implementation—you must use the **Formal Design Document** structure below, saving it to `docs/agents/planning/<TASK_DESCRIPTION>_PLAN.md`.

Expand Down
18 changes: 10 additions & 8 deletions docs/agents/instructions/infrastructure.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: infrastructure
description: Infrastructure Skill Instructions
---
# Infrastructure Skill Instructions

\<primary_directive>
## Primary Directive
Your objective is to ensure that all research environments, compute jobs, and pipelines are reproducible, resilient, and explicitly defined as code.
**MANDATE:** Apply the project-specific rules outlined below for all infrastructure and environment tasks.
\</primary_directive>

<context>
## Context
This project uses modern Python packaging and infrastructure-as-code principles.
</context>

<standards>
## Standards
You MUST enforce the following project-specific infrastructure standards:

### 1. Environment Management
Expand All @@ -30,11 +32,11 @@ You MUST enforce the following project-specific infrastructure standards:
### 4. HPC & Automation

- **Explicit Resources:** If interacting with SLURM or cluster job scripts, ALWAYS request specific resources (`cpus-per-task`, memory, etc.).
</standards>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **"ClickOps":** You MUST NOT recommend setting up environments or servers manually via a GUI.
- ❌ **Untracked Environments:** Do not add dependencies without ensuring they are reflected in `pyproject.toml` and `uv.lock`.
- ❌ **Hardcoded Secrets:** You MUST NEVER include API keys or tokens in scripts, Makefiles, or Dockerfiles.
\</forbidden_patterns>

18 changes: 10 additions & 8 deletions docs/agents/instructions/ml.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: ml
description: Machine Learning & Geospatial Processing Instructions
---
# Machine Learning & Geospatial Processing Instructions

\<primary_directive>
## Primary Directive
Your goal is to help build state-of-the-art models and data pipelines that are reproducible, reliable, and well-documented.
**MANDATE:** Apply the project-specific rules outlined below for all ML and geospatial processing tasks.
\</primary_directive>

<context>
## Context
This project deals heavily with geospatial datasets (Sentinel-2, Radar, etc.) which introduce unique memory and projection challenges compared to standard ML pipelines.
</context>

<standards>
## Standards
You MUST enforce the following project-specific standards:

### 1. Geospatial Data Handling
Expand All @@ -27,11 +29,11 @@ You MUST enforce the following project-specific standards:

- **Config-Driven:** Hyperparameters and dataset paths MUST be externalized to configuration files and loaded via Pydantic models.

</standards>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Silent OOMs:** You MUST NOT write data loaders that attempt to load massive raster datasets entirely into RAM.
- ❌ **Ignoring CRS:** You MUST NEVER perform spatial joins or distance calculations without first asserting both datasets share the exact same CRS.
- ❌ **Fitting on Test Data:** You MUST NEVER allow data transformations to be fitted on the validation or test sets.
\</forbidden_patterns>

18 changes: 10 additions & 8 deletions docs/agents/instructions/orchestrator.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: orchestrator
description: Orchestrator Skill Instructions
---
# Orchestrator Skill Instructions

\<primary_directive>
## Primary Directive
Your primary responsibility is the horizontal integration of all research components.
**MANDATE:** Apply the project-specific rules outlined below for all orchestration and integration tasks.
\</primary_directive>

<context>
## Context
In this repository, successful orchestration means tying together raw geospatial data fetching (STAC/Copernicus), pre-processing (Rasterio/Xarray), and output generation (COGs/Zarr).
</context>

<workflow>
## Workflow
For any task requiring more than a minor fix, you MUST enforce the following framework:

### 1. The Written Plan (Mandatory)
Expand All @@ -25,11 +27,11 @@ Before writing implementation code, you MUST create or update a `<TASK_DESCRIPTI

- Implement exactly ONE step from the plan at a time.
- After completing a step, you MUST STOP and ask the user to validate the output before moving to the next step.
</workflow>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Vertical Myopia:** You MUST NOT focus entirely on optimizing one specific file while ignoring how it breaks integration with the rest of the project (e.g., changing a config structure without updating `geospatial_tools_ini.yaml.example`).
- ❌ **Implied Contracts:** You MUST NOT build components that pass raw, untyped dictionaries to each other. Always enforce explicit data contracts (e.g., Pydantic Models, Dataclasses).
- ❌ **Skipping E2E Testing:** You MUST NOT declare a complex integration "complete" without verifying that the data flows from start to finish via `nox` testing sessions or test notebooks.
\</forbidden_patterns>

7 changes: 5 additions & 2 deletions docs/agents/instructions/plan_to_tasks.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
---
name: plan_to_tasks
description: Plan To Tasks
---
# Plan To Tasks

\<primary_directive>
## Primary Directive
**MANDATE:** Decompose a plan into modular, atomic tasks, each documented in its own file (or a structured document) with all the context needed for implementation and verification.
\</primary_directive>

## Core Workflow

Expand Down
21 changes: 12 additions & 9 deletions docs/agents/instructions/python.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
---
name: python
description: Python & QA Skill Instructions
---
# Python & QA Skill Instructions

\<primary_directive>
## Primary Directive
Your objective is to elevate research scripts into robust, maintainable, and type-safe software.
**MANDATE:** Apply the project-specific rules outlined below for all Python development and QA tasks.
\</primary_directive>

<context>
## Context
This project relies heavily on modern Python tooling and strictly enforced quality assurance.
- **Pre-commit is Central:** All QA tasks (linting, formatting, type checking) are orchestrated via `pre-commit`.
- **Environment & Build:** We use `uv` for package management and `hatchling` as the build backend (defined in `pyproject.toml`).
- **Task Automation:** We use `nox` for isolated test environments and task execution.
- **Makefile:** We use a makefile to automate and orchestrate most things in this project. Use `make targets` to discover the available targets.
</context>

<standards>
## Standards
You MUST strictly adhere to the following project-specific Python standards:

### 1. QA & Tooling
Expand All @@ -27,19 +29,20 @@ You MUST strictly adhere to the following project-specific Python standards:
- **Strict Typing:** You MUST use type hints for ALL function arguments and return values (e.g., `def process(data: str | Any) -> pd.DataFrame`).
- **Filesystem Paths:** You MUST NEVER use `os.path`. ALWAYS use `pathlib.Path` for all file and directory manipulations.
- **Logging:** Use `structlog` for application flow. NEVER use `print()` for production code.
- **Function Calls:** Prefer keyword arguments for complex function calls to enhance readability and maintainability.
- **Data Structures:** ALWAYS use `@dataclass` or `pydantic` models for complex structures instead of untyped dictionaries.
- **Type Hints Format:** Always prefer X | Y format over Union[X, Y].
- **Docstrings:** Always add docstrings to your functions and classes. Use the Google standard for docstrings. Don't show types in docstrings.
- **Docstrings:** Always add docstrings to your functions and classes. Use the Google standard for docstrings and follow the Diátaxis framework for documentation structure. Don't show types in docstrings.

### 3. Testing & Performance

- **Vectorization:** ALWAYS prefer vectorized operations (NumPy, Pandas, Polars, Xarray) over native Python `for` loops when processing geospatial data.
</standards>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Bypassing Pre-commit:** Do not commit code that fails `pre-commit` checks. Fix the underlying linting or typing issue.
- ❌ **Global Mutable State:** You MUST NEVER define or mutate global variables to pass state between functions.
- ❌ **Magic Numbers/Strings:** You MUST NOT hardcode numeric constants. Extract them to Pydantic settings or config classes.
- ❌ **Bare Except Blocks:** You MUST NEVER use `except: pass` or `except Exception: pass`.
\</forbidden_patterns>

18 changes: 10 additions & 8 deletions docs/agents/instructions/root_cause_analysis.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: root_cause_analysis
description: Root Cause Analysis (RCA) Skill Instructions
---
# Root Cause Analysis (RCA) Skill Instructions

\<primary_directive>
## Primary Directive
Your objective is to systematically diagnose and permanently fix software failures.
**MANDATE:** Apply the project-specific rules outlined below for all debugging and root cause analysis tasks.
\</primary_directive>

<context>
## Context
Geospatial errors are often opaque (e.g., `rasterio.errors.RasterioIOError`, mismatched CRSs, out-of-bounds bounding boxes). Slapping a `try/except` block over an error without understanding it creates brittle systems that fail silently later.
</context>

<workflow>
## Workflow
When presented with a traceback or unexpected result, you MUST follow this workflow:

### Step 1: Evidence Gathering
Expand All @@ -29,11 +31,11 @@ When presented with a traceback or unexpected result, you MUST follow this workf

- Propose the smallest, most targeted code change required. Prove the fix works via `pytest`.
- Document the finding in `docs/agents/instructions/KNOWLEDGE.md` if it represents a systemic quirk (e.g., a specific STAC catalog behavior).
</workflow>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Guesswork:** You MUST NOT propose random fixes (e.g., "try reprojecting it again") without a coherent hypothesis based on the traceback and data state.
- ❌ **Patching Symptoms:** You MUST NEVER suppress an error (e.g., using a bare `except: pass`) without fixing the foundational logic flaw that caused it.
- ❌ **Fixing Without Explaining:** You MUST NOT provide a corrected block of code without first explaining the root cause of the bug to the researcher.
\</forbidden_patterns>

19 changes: 11 additions & 8 deletions docs/agents/instructions/security.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
name: security
description: Security Skill Instructions
---
# Security Skill Instructions

\<primary_directive>
## Primary Directive
Your primary objective is to identify vulnerabilities, enforce defense-in-depth, and ensure absolute data privacy.
**MANDATE:** Apply the project-specific rules outlined below for all tasks involving security, authentication, or data privacy.
\</primary_directive>

<context>
## Context
Geospatial research often involves massive downloads from third-party catalogs (STAC, Copernicus) requiring authentication tokens. Exposing these tokens compromises the lab's infrastructure limits.
</context>

<standards>
## Standards
You MUST actively enforce the following project-specific security standards:

### 1. Secret Management
Expand All @@ -21,10 +23,11 @@ You MUST actively enforce the following project-specific security standards:

- **Path Traversal:** When dynamically generating file paths based on STAC item IDs or user input, use `pathlib.Path.resolve()` to ensure paths do not traverse outside the intended output directory (`../`).
- **Deserialization:** Do not use `pickle` or `numpy.load(allow_pickle=True)` for data acquired from external STAC catalogs.
</standards>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **Committing Secrets:** You MUST NEVER allow code containing hardcoded credentials or API tokens to be committed.
- ❌ **Disabling SSL Verification:** You MUST NEVER permit `verify=False` in `requests` or `aiohttp` calls to STAC catalogs or data endpoints.
\</forbidden_patterns>
- ❌ **Raw SQL:** Always use ORMs or parameterized queries to prevent SQL injection. You MUST NEVER construct raw SQL strings with user input.

18 changes: 10 additions & 8 deletions docs/agents/instructions/specdrivendev.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
---
name: specdrivendev
description: Skill: Lightweight Spec-Driven Development (SDD)
---
# Skill: Lightweight Spec-Driven Development (SDD)

\<primary_directive>
## Primary Directive
You are an **Educational Architect** teaching a researcher how to use a lightweight version of Spec-Driven Development (SDD).
**MANDATE:** Apply the project-specific rules outlined below for defining new features or interfaces.
\</primary_directive>

<context>
## Context
In geospatial research, jumping straight into implementation often leads to messy code, unclear boundaries (e.g., passing untyped numpy arrays without CRS metadata), and debugging nightmares.
By defining the "Specification" or "Contract" first we force the researcher to think precisely about inputs, spatial bounds, shapes, and edge cases.
</context>

<workflow>
## Workflow
When starting a new feature, you MUST guide the researcher through these steps:

### Step 1: Define the Nouns (Dataclasses)
Expand All @@ -27,10 +29,10 @@ When starting a new feature, you MUST guide the researcher through these steps:

- Use `raise NotImplementedError()` for the function body.
- **STOP.** Present the stub to the researcher and ask for validation BEFORE generating the logic.
</workflow>


\<forbidden_patterns>
## Forbidden Patterns

- ❌ **The `Any` Escape Hatch:** You MUST NOT use `Any` in type hints unless absolutely unavoidable. Use `xarray.Dataset` or `geopandas.GeoDataFrame` specifically.
- ❌ **Logic Before Contract:** You MUST NOT write the function logic before the signature and docstring are established.
\</forbidden_patterns>

Loading