diff --git a/.github/skills/document-retrieval/SKILL.md b/.github/skills/document-retrieval/SKILL.md
new file mode 100644
index 00000000..9c077424
--- /dev/null
+++ b/.github/skills/document-retrieval/SKILL.md
@@ -0,0 +1,173 @@
+---
+name: document-retrieval
+description: Build and tune retrieval configs that search, rank, and collect ordinance documents in COMPASS. Use whenever a user asks to improve retrieval precision/recall, tune search queries/keywords, or debug acquisition quality before extraction tuning.
+---
+
+# Web Scraper Skill
+
+Use this skill to improve retrieval precision/recall before extraction tuning.
+Applies to both one-shot (schema-driven) and legacy decision-tree extraction
+pipelines.
+
+## When to use
+
+- Download step returns noisy sources (one-shot extraction).
+- Ordinance recall is weak across jurisdictions (one-shot extraction).
+- LLM filtering is compensating for poor search quality.
+
+## Do not use
+
+- Schema feature definition or value extraction logic design.
+- Post-extraction feature/value debugging when retrieval is already correct.
+
+## Expected assistant output
+
+When using this skill, return:
+
+1. The retrieval axis changed (queries, keyword weights, or heuristics).
+2. Evidence from artifacts/logs showing why the change was needed.
+3. The next run command against the same jurisdiction sample.
+
+## Canonical reference
+
+Consult example plugin configurations in `examples/`:
+- `examples/one_shot_schema_extraction/plugin_config.yaml` — standard one-shot config
+- `examples/water_rights_demo/one-shot/plugin_config.yaml` — multi-document edge cases
+
+When creating new tech configs, use `<tech>_plugin_config.yaml` as a recommended
+naming convention (e.g. `geothermal_plugin_config.yaml`).
+
+## Scope
+
+- Query-template strategy.
+- URL ranking and filtering patterns.
+- Heuristic phrase controls before LLM validation.
+
+## Two retrieval phases
+
+COMPASS runs two sequential acquisition passes per jurisdiction:
+
+1. **Search-engine phase** — queries `SerpAPIGoogleSearch` (or configured
+   engine) using `query_templates`. This phase is the primary source of
+   ordinance documents.
+2. **Website crawl phase** — crawls the jurisdiction's official website,
+   ranking pages using `website_keywords`. This phase is a secondary pass
+   and runs only if the search-engine phase did not yield an ordinance
+   context.
+
+Key behaviors:
+- Playwright browser errors during the website crawl phase are **non-fatal**.
+  COMPASS logs the error and continues.
+- `Found 0 potential documents` at the end of the crawl phase is **expected**
+  for jurisdictions without relevant online ordinances.
+- Disable the crawl phase with `perform_website_search: false` in run config
+  when you want faster smoke tests or Playwright is unavailable.
+
+## Key management
+
+For SerpAPI-backed search, keep `api_key` out of committed config and provide
+`SERPAPI_KEY` via environment (for example through `.env` loaded in shell).
+
+Recommended shell setup:
+
+```bash
+set -a
+source .env
+set +a
+```
+
+Avoid spaces around `=` in `.env` assignments.
+
+## Retrieval design pattern
+
+1. Create 3-7 jurisdiction queries with `{jurisdiction}`.
+2. Weight legal document indicators in URL keywords.
+3. Apply exclusions for templates/reports/slides.
+4. Add focused negative tech terms to reduce false positives.
+5. Start with dynamic search, then switch to deterministic known URLs when
+  search infrastructure is unstable.
+
+When using `heuristic_keywords`, use these four lists to guide pre-LLM filtering:
+- `GOOD_TECH_KEYWORDS` — strong indicators of the target technology
+  (e.g., facility types, deployment modes). Documents matching even a
+  few keywords are marked as candidates.
+- `GOOD_TECH_PHRASES` — multi-word phrases that signal relevant
+  ordinance content. Keep specific to avoid false positives.
+- `GOOD_TECH_ACRONYMS` — industry-standard abbreviations for the
+  technology. Narrow list; include only widely recognized acronyms.
+- `NOT_TECH_WORDS` — pre-heuristic filter that rejects documents
+  before keyword matching. Use to exclude adjacent technologies and
+  irrelevant domains (e.g., residential HVAC, unrelated industries).
+  Runs first; prevents wasted keyword evaluation on clearly-wrong
+  documents.
+
+If any required list is missing or empty, COMPASS raises a plugin
+configuration error and extraction quality should be treated as failed.
+
+For first-pass reliability, test retrieval with deterministic known URLs
+before using live web search.
+
+## Technology-specific retrieval controls (template)
+
+- Include target-technology facility/deployment terms.
+- Exclude adjacent and non-target terms (residential/HVAC/PV/etc as needed).
+- Favor jurisdictional legal-code signals like `land use code`,
+  `code of ordinances`, `use table`, and `special use permit`.
+
+## Deterministic smoke-test mode
+For this smoke test, at least one of the following documentation sources must be provided:
+
+- **`known_doc_urls`**: A list of URLs pointing to external documentation that the scraper can access and parse
+- **`known_local_docs`**: A collection of local documentation files available in the repository or system
+
+Use run-config controls to bypass flaky search while tuning:
+
+- supply `known_doc_urls` or `known_local_docs`,
+- set `perform_se_search: false`,
+- set `perform_website_search: false`.
+
+Then validate:
+
+- download artifacts exist,
+- cleaned text exists,
+- ordinance DB rows are non-empty.
+
+## Tuning loop
+
+1. Run SE-search phase on small sample.
+2. Inspect kept vs discarded PDFs (`ordinance_files/`).
+3. Run heuristic filter and review false rejects/accepts (`cleaned_text/`).
+4. Check website crawl phase independently if needed (enable, run, inspect logs).
+5. Update one axis only:
+   - query templates (affects SE phase),
+   - URL weights (affects both phases),
+   - include/exclude heuristic patterns (pre-LLM filter),
+   - `NOT_TECH_WORDS` (upstream document rejection).
+6. Re-run same sample and compare.
+
+## Cross-tech onboarding
+
+When reusing this workflow for any technology:
+
+- keep legal retrieval tokens (`ordinance`, `zoning`, `code`),
+- replace all technology terms in `query_templates`, `website_keywords`,
+  and `heuristic_keywords`,
+- seed `known_doc_urls` with authoritative regulatory documents for smoke
+  testing,
+- avoid copying negatives from previous technologies into the new tech config,
+- verify `NOT_TECH_WORDS` excludes adjacent technologies for your domain.
+
+## Phase gates
+
+- **3 jurisdictions**: ensure major source classes are found.
+- **10 jurisdictions**: verify stability across regions.
+
+
+## Guardrails
+
+- Keep feature extraction logic out of retrieval config.
+- Do not overfit to one county's document style.
+- Preserve auditable rationale for each retrieval change.
+- Keep one canonical retrieval config per active technology.
+- Ensure each run uses a unique `out_dir` to avoid COMPASS aborting early.
+
diff --git a/.github/skills/extraction-run/SKILL.md b/.github/skills/extraction-run/SKILL.md
new file mode 100644
index 00000000..c2fafa8b
--- /dev/null
+++ b/.github/skills/extraction-run/SKILL.md
@@ -0,0 +1,273 @@
+---
+name: extraction-run
+description: Execute one-shot extraction with COMPASS and iterate quickly with low cost. Use whenever a user asks to run, smoke-test, validate, debug, or scale one-shot schema extraction for any technology.
+---
+
+# Extraction Run Skill
+
+**ONE-SHOT EXTRACTION ONLY.** This skill applies only to schema-driven extraction.
+For decision-tree extraction (solar, wind, small wind), consult COMPASS
+architecture docs.
+
+Use this skill to run one-shot extraction in a repeatable, low-risk way,
+then iterate quickly until you have stable structured outputs.
+
+## When to use
+
+- Schema exists and plugin config points to it.
+- You need a reliable smoke-test workflow before scaling.
+- You are NOT using decision-tree extraction.
+
+## Do not use
+
+- Decision-tree extraction feature engineering.
+- Python parser implementation in `compass/extraction/<tech>/parse.py`.
+- Non-extraction tasks (for example docs-only updates).
+
+## Expected assistant output
+
+When using this skill, return:
+
+1. The exact `pixi run compass process ...` command used.
+2. A pass/fail decision against extraction-quality gates.
+3. The smallest next config/schema change and why.
+
+## Canonical reference
+
+- `examples/one_shot_schema_extraction/` — complete working examples
+- `examples/one_shot_schema_extraction/README.rst` — general one-shot overview
+- `examples/water_rights_demo/one-shot/` — multi-doc extraction example
+
+## Two-pipeline modes
+
+COMPASS supports two distinct extraction pipelines. Choose one and do not mix
+them for the same technology:
+
+| Mode | Where code lives | Good for |
+|---|---|---|
+| **One-shot (schema-based)** | `examples/` → `compass/extraction/<tech>/` | New techs, no Python changes |
+| **decision-tree** | Python code in `compass/extraction/<tech>/` | Existing solar, wind, small wind |
+
+One-shot is the correct path for all new technology onboarding. It requires
+only a schema JSON, a plugin YAML, and a run config — no Python source changes.
+
+## Tech promotion lifecycle
+
+New technology assets start in `examples/` and finish in `compass/extraction/`:
+
+1. **Develop** — place all assets in `examples/one_shot_schema_extraction/`
+2. **Stabilize** — iterate schema/plugin until smoke and robustness gates pass
+3. **Promote** — copy the three finalized files into `compass/extraction/<tech>/`:
+   - `<tech>_schema.json`
+   - `<tech>_plugin_config.yaml`
+   - `__init__.py` — registers the plugin via `create_schema_based_one_shot_extraction_plugin`
+
+   After creating the package, add an import in `compass/extraction/__init__.py`
+   to register the plugin at startup. See `compass/extraction/ghp/__init__.py`
+   for a reference implementation.
+
+## Required inputs
+
+- Run config for `compass process`.
+- Plugin config containing `schema`.
+- API keys in environment (never hardcode in configs).
+- A jurisdiction set sized to the current phase.
+
+## Preflight checks (must pass before run)
+
+- Jurisdiction CSV has headers `County,State` or `County,State,Subdivision,Jurisdiction Type`.
+- `out_dir` is unique for this run.
+- At least one acquisition step is enabled:
+  `perform_se_search: true`, `perform_website_search: true`,
+  `known_doc_urls`, or `known_local_docs`.
+- If `heuristic_keywords` exists, all four required lists are present and
+  non-empty.
+
+## Naming convention
+
+Use tech-first names for all one-shot assets:
+
+- `<tech>_plugin_config.yaml`
+- `<tech>_schema.json`
+- `<tech>_jurisdictions*.csv`
+
+The `tech` value in the run config must be a string that becomes the plugin
+registry identifier. It must be unique, lowercase, and underscore-separated
+(for example `concentrating_solar`, `geothermal_electricity`). COMPASS will
+raise `Unknown tech input` if this key does not match any registered plugin.
+
+## Canonical development pattern
+
+For early development, start with the proven dynamic baseline, then fall back
+to deterministic mode only when search infrastructure is unstable:
+
+1. Use one small jurisdiction file (1-3 rows).
+2. Use your preferred configured search engine.
+3. Load `.env` into shell (`set -a && source .env && set +a`).
+4. Run with verbose logs:
+   - `pixi run compass process -c config.json5 -p plugin.yaml -v`
+5. Confirm output artifacts exist before tuning schema semantics.
+
+Fallback mode when needed:
+
+- Add `known_doc_urls` (or `known_local_docs`) in run config.
+- Set `perform_se_search: false` and `perform_website_search: false`.
+
+## Adaptation rule
+
+When adapting this workflow for a new technology, keep the run structure
+unchanged and swap only technology-specific inputs:
+
+- `tech` in run config,
+- schema file,
+- plugin descriptor (`data_type_short_desc`),
+- retrieval query/keyword vocabulary,
+- known document URL set.
+
+Change one axis per run unless debugging infrastructure failures.
+
+## Environment setup
+
+Load secrets from `.env` before running. Never commit key values in config files.
+
+```bash
+set -a && source .env && set +a   # no spaces around = in .env assignments
+```
+
+## Core command
+
+```bash
+pixi run compass process -c config.json5 -p path/to/plugin_config.yaml -v
+```
+
+## Phase-gated workflow
+
+1. **Smoke test (1 jurisdiction)**
+   - Goal: verify wiring and output contract.
+2. **Robustness (5 jurisdictions)**
+   - Goal: verify feature stability and edge-case handling.
+
+## Validation checklist
+
+Evaluate each run on:
+
+- document relevance (exclude off-domain content),
+- feature coverage vs expected ordinance topics,
+- section/summary traceability,
+- unit consistency,
+- null discipline,
+
+## Expected output artifacts
+
+A successful run produces these files under `out_dir`:
+
+| Artifact | Meaning |
+|---|---|
+| `ordinance_files/*.pdf` | Downloaded source documents |
+| `cleaned_text/*.txt` | Heuristic-filtered extracted text |
+| `jurisdiction_dbs/*.csv` | Per-jurisdiction raw extraction rows |
+| `quantitative_ordinances.csv` | Final compiled numeric features |
+| `qualitative_ordinances.csv` | Final compiled qualitative features |
+| `usage.json` | Per-jurisdiction LLM token and request counts |
+| `meta.json` | Run metadata (cost, timing, version) |
+
+Final CSV columns: `county`, `state`, `subdivision`, `jurisdiction_type`,
+`FIPS`, `feature`, `value`, `units`, `adder`, `min_dist`, `max_dist`,
+`summary`, `year`, `section`, `source`.
+
+## Interpreting output status correctly
+
+`cleaned_text` files can exist while `Number of documents found` is `0`.
+
+This means acquisition/text collection worked, but no final structured ordinance
+rows were emitted into consolidated DB outputs.
+
+Check in order:
+
+1. `outputs/*/cleaned_text/*.txt` (text extraction present)
+2. `outputs/*/jurisdiction_dbs/*.csv` (per-jurisdiction parsed rows)
+3. `outputs/*/quantitative_ordinances.csv` and
+   `outputs/*/qualitative_ordinances.csv` (final compiled results)
+
+Treat the run as **failed for extraction quality** when either is true:
+- `Number of jurisdictions with extracted data: 0`
+- any configuration exception appears in logs (even if process exits 0)
+
+Only treat a run as passing when both are true:
+- at least one jurisdiction has extracted data
+- at least one jurisdiction CSV in `jurisdiction_dbs/` has more than header row
+
+## Root-cause triage
+
+- **Wrong or noisy documents**
+  - Tune query templates, URL keywords, and exclusions.
+  - Prefer `known_doc_urls` while stabilizing.
+- **Right documents, wrong fields**
+  - Tune schema descriptions/examples and ambiguity rules.
+  - Check `extraction_system_prompt` in plugin YAML — it is the primary
+    guard against scope bleed from generic legal documents.
+- **Correct values, unstable formatting**
+  - Tighten enums, unit vocabulary, and null behavior.
+- **Nothing downloaded / unstable search**
+  - Disable live search and use deterministic known URLs/local docs.
+- **0 documents found for a jurisdiction during website crawl**
+  - Expected for jurisdictions with few online ordinances. The website
+    crawl is a second acquisition pass after search-engine retrieval;
+    0 results there is not a pipeline failure.
+
+## Acceptance gates
+
+Do not advance phases until all are true:
+
+- Output rows conform to required contract.
+- High share of rows include useful `section` and `summary`.
+- Feature names are stable and machine-consistent.
+- Repeated runs on same sample show minimal drift.
+
+## Cost and speed controls
+
+- Keep sample size minimal while tuning.
+- Change one variable per run.
+- Archive run command, input set, and output path for each iteration.
+
+## Workspace hygiene (important)
+
+Keep one canonical working set per technology in `examples/`:
+
+- one run config,
+- one plugin config,
+- one schema,
+- one jurisdiction file,
+- one known docs file.
+
+Delete stale `_migrated`, `_smoke`, and duplicate output folders to avoid
+configuration drift and debugging confusion.
+
+## Known infrastructure issues
+
+### Playwright timeouts
+
+Web search via `rebrowser_playwright` may fail with 60s timeouts on
+`Page.wait_for_selector`. Symptoms:
+- `TimeoutError: Page.wait_for_selector: Timeout 60000ms exceeded`
+- All search queries fail consistently
+- Browser session crashes with `ProtocolError: Internal server error, session closed`
+
+These errors during the **website crawl phase** (second acquisition pass) are
+**non-fatal**. COMPASS logs them and continues. They do not block the
+search-engine phase or extraction.
+
+If search itself is failing, verify provider credentials are loaded and fall
+back to deterministic mode.
+
+**Workaround**: Use `known_local_docs` or `known_doc_urls` and disable
+search/website steps while validating extraction logic.
+
+### known_local_docs loading failures
+
+`known_local_docs` may fail silently with `ERROR: Failed to read file` in
+jurisdiction logs due to external loader behavior.
+
+**Workaround**: Prefer `known_doc_urls` for deterministic smoke tests and
+pre-validate local docs before pipeline runs.
+
diff --git a/.github/skills/plugin-config-setup/SKILL.md b/.github/skills/plugin-config-setup/SKILL.md
new file mode 100644
index 00000000..0c83b5f9
--- /dev/null
+++ b/.github/skills/plugin-config-setup/SKILL.md
@@ -0,0 +1,279 @@
+---
+name: plugin-config-setup
+description: Author and tune one-shot plugin YAML for COMPASS document discovery, filtering, and text collection. Use whenever a user asks to create, clean up, standardize, or troubleshoot one-shot plugin YAML for technology onboarding.
+---
+
+# YAML Setup Skill
+
+**ONE-SHOT EXTRACTION ONLY.** This skill applies only to schema-driven extraction.
+For legacy decision-tree extraction, consult COMPASS architecture docs.
+
+Use this skill to create or tune one-shot plugin YAML that controls retrieval,
+filtering, and text collection behavior.
+
+## When to use
+
+- New technology onboarding in one-shot extraction (NOT decision-tree extraction).
+- Schema exists but source relevance is weak.
+- You need reproducible config handoff across teams.
+
+## Do not use
+
+- Legacy decision-tree parser implementation changes.
+- Schema feature semantics work that belongs in `<tech>_schema.json`.
+- Run-result diagnosis after outputs are generated (use iteration loop skill).
+
+## Expected assistant output
+
+When using this skill, return:
+
+1. The finalized plugin YAML content or exact diff.
+2. Any required paired run-config changes.
+3. A validation command and pass/fail checks for the edited YAML.
+
+## Canonical reference
+
+Consult the working examples in `examples/`:
+- `examples/one_shot_schema_extraction/plugin_config.yaml` — standard working example
+- `examples/water_rights_demo/one-shot/plugin_config.yaml` — multi-doc edge case
+
+When creating new tech configs, `<tech>_plugin_config.yaml` is the recommended
+naming convention (e.g. `geothermal_plugin_config.yaml`). The existing
+`plugin_config.yaml` examples use a generic name; new tech-specific assets
+should use the tech-first naming pattern.
+
+Refer to any complete example in `examples/` that matches your retrieval goals.
+
+## Naming convention
+
+Use tech-first file names when creating new one-shot assets:
+`<tech>_config*.json5`, `<tech>_plugin_config.yaml`,
+`<tech>_schema.json`, `<tech>_jurisdictions*.csv`.
+
+## Secret handling
+
+Keep API keys in environment variables (for example `SERPAPI_KEY`,
+`AZURE_OPENAI_API_KEY`) rather than in plugin or run config files.
+Load them per shell session with `set -a && source .env && set +a`.
+Avoid spaces around `=` in `.env` assignments.
+
+## Required minimum
+
+```yaml
+schema: ./my_schema.json
+```
+
+## Non-negotiable runtime constraints
+
+- Jurisdiction CSV headers are case-sensitive: use `County,State`.
+- If `heuristic_keywords` is present, it must include all four lists and
+  none may be empty.
+- A run is not considered passing if logs show config errors or if
+  extracted jurisdiction count is zero.
+
+## Key plugin YAML fields
+
+| Field | Type | Code Reference |
+|---|---|---|
+| `schema` | string (path) | [base.py#L124–L131](../../../compass/plugin/one_shot/base.py) |
+| `data_type_short_desc` | string | [base.py#L483](../../../compass/plugin/one_shot/base.py#L483) |
+| `query_templates` | list | [base.py#L217–L240](../../../compass/plugin/one_shot/base.py#L217) |
+| `website_keywords` | dict | [base.py#L281–L338](../../../compass/plugin/one_shot/base.py#L281) |
+| `heuristic_keywords` | dict or `true` | [base.py#L340–L390](../../../compass/plugin/one_shot/base.py#L340); [base.py#L512](../../../compass/plugin/one_shot/base.py#L512) |
+| `collection_prompts` | list or `true` | [base.py#L413–L436](../../../compass/plugin/one_shot/base.py#L413) |
+| `text_extraction_prompts` | list or `true` | [base.py#L438–L468](../../../compass/plugin/one_shot/base.py#L438) |
+| `extraction_system_prompt` | string | [base.py#L476–L488](../../../compass/plugin/one_shot/base.py#L476) |
+| `cache_llm_generated_content` | bool | [base.py#L107–L117](../../../compass/plugin/one_shot/base.py#L107) |
+
+**For the complete list of all configuration options (including `allow_multi_doc_extraction` and any future additions), consult the docstring of [`create_schema_based_one_shot_extraction_plugin()`](../../../compass/plugin/one_shot/base.py#L51).**
+
+## Required `heuristic_keywords` shape
+
+When using `heuristic_keywords`, use these four lists to guide pre-LLM filtering:
+- `GOOD_TECH_KEYWORDS` — strong indicators of the target technology
+  (e.g., facility types, deployment modes). Documents matching even a
+  few keywords are marked as candidates.
+- `GOOD_TECH_PHRASES` — multi-word phrases that signal relevant
+  ordinance content. Keep specific to avoid false positives.
+- `GOOD_TECH_ACRONYMS` — industry-standard abbreviations for the
+  technology. Narrow list; include only widely recognized acronyms.
+- `NOT_TECH_WORDS` — pre-heuristic filter that rejects documents
+  before keyword matching. Use to exclude adjacent technologies and
+  irrelevant domains (e.g., residential HVAC, unrelated industries).
+  Runs first; prevents wasted keyword evaluation on clearly-wrong
+  documents.
+
+Use this exact structure when defining `heuristic_keywords`:
+
+```yaml
+heuristic_keywords:
+  GOOD_TECH_KEYWORDS:
+    - "<required single-word term>"
+  GOOD_TECH_PHRASES:
+    - "<required multi-word phrase>"
+  GOOD_TECH_ACRONYMS:
+    - "<required acronym or short token>"
+  NOT_TECH_WORDS:
+    - "<required exclusion term>"
+```
+
+Notes:
+- Keys are normalized, but using canonical key names reduces mistakes.
+- All four lists are required and must be non-empty.
+
+### `collection_prompts: true` and `text_extraction_prompts: true`
+
+Setting either flag to `true` (not a list) instructs COMPASS to use the LLM
+to auto-generate the prompts from the schema content. This is the recommended
+shortcut during development — do not write manual prompt lists until
+auto-generated ones prove insufficient.
+
+### `extraction_system_prompt`
+
+This is the primary control for preventing scope bleed from generic land-use
+code documents. Write it as a multi-line YAML literal block:
+
+```yaml
+extraction_system_prompt: |-
+  You are a legal scholar extracting structured data from
+  utility-scale <tech> ordinances.
+
+  Extract only enacted requirements for utility-scale <tech> facilities.
+  Exclude adjacent technologies and non-target use cases.
+  Prefer explicit values. Use null for qualitative obligations.
+```
+
+See `compass/extraction/ghp/plugin_config.yaml` for a complete example.
+
+## Progressive config path
+
+1. **Minimal**
+   - Confirm schema path and extraction invocation work.
+2. **Simple**
+   - Add `query_templates`, `heuristic_keywords`, and `cache_llm_generated_content`.
+3. **Full**
+   - Add `extraction_system_prompt` if scope bleed or off-domain extraction
+     is observed.
+   - Set `collection_prompts: true` and `text_extraction_prompts: true` to
+     let the LLM auto-generate prompts from the schema.
+   - Replace `heuristic_keywords: true` with an explicit list if precision
+     is insufficient.
+
+Use the same progression for any technology.
+
+## Baseline YAML pattern
+
+```yaml
+schema: ./my_schema.json
+data_type_short_desc: utility-scale <tech> ordinance
+cache_llm_generated_content: true
+query_templates:
+  - "filetype:pdf {jurisdiction} <tech> ordinance"
+  - "{jurisdiction} <tech> zoning ordinance"
+  - "{jurisdiction} <tech> permitting requirements"
+website_keywords:
+  pdf: 92160
+  <tech>: 46080
+  ordinance: 23040
+  zoning: 2880
+  permit: 1440
+heuristic_keywords:
+  GOOD_TECH_KEYWORDS:
+    - "<tech keyword 1>"
+    - "<tech keyword 2>"
+  GOOD_TECH_ACRONYMS:
+    - "<tech acronym>"
+  GOOD_TECH_PHRASES:
+    - "<tech phrase 1>"
+    - "<tech phrase 2>"
+  NOT_TECH_WORDS:
+    - "<adjacent technology term 1>"
+    - "<adjacent technology term 2>"
+```
+
+Swap vocabulary for any technology while keeping the same structure.
+
+## Stable development mode
+
+Use run-config controls for deterministic smoke tests while iterating schema:
+
+- `known_doc_urls` or `known_local_docs` — bypass live search
+- `perform_se_search: false` — disable search-engine phase
+- `perform_website_search: false` — disable website crawl phase
+
+Re-enable search only after extraction quality is stable on known documents.
+
+Recommended baseline: use dynamic search first, then use deterministic mode
+if search infrastructure fails.
+
+## Minimal run-config contract (to pair with plugin YAML)
+
+Use this pattern and require users to provide their own model and client
+values:
+
+```json5
+{
+  out_dir: "./outputs_<tech>_<run_id>",
+  tech: "<tech>",
+  jurisdiction_fp: "./<tech>_jurisdictions.csv",
+  perform_se_search: true,
+  perform_website_search: false,
+  model: [
+    {
+      name: "<PROVIDE-YOUR-MODEL-NAME>",
+      llm_call_kwargs: { temperature: 0, timeout: 600 },
+      client_kwargs: {
+        api_version: "<PROVIDE-YOUR-API-VERSION>",
+        azure_endpoint: "<PROVIDE-YOUR-AZURE-ENDPOINT>"
+      }
+    }
+  ]
+}
+```
+
+## Acquisition phases
+
+COMPASS acquisition runs in two sequential phases per jurisdiction:
+
+1. **Search-engine phase** — uses `SerpAPIGoogleSearch` or similar; driven by
+   `query_templates`.
+2. **Website crawl phase** — crawls the jurisdiction's main website using
+   `website_keywords` for ranking. Playwright browser errors during this
+   phase are **non-fatal**; COMPASS logs them and moves on.
+
+`perform_website_search: false` skips phase 2. Use it during smoke tests to
+keep run time short and avoid Playwright dependency issues.
+
+## Validation checklist
+
+- Schema path resolves from runtime working directory.
+- Query templates include `{jurisdiction}` consistently.
+- URL weights favor legal and government documents.
+- Heuristic exclusions are precise and not over-broad.
+- Prompt overrides are only added when default behavior fails.
+
+## Cross-tech adaptation checklist
+
+When adapting to another technology:
+
+- replace vocabulary in `query_templates` and `website_keywords`,
+- keep legal-code terms (`ordinance`, `zoning`, `code of ordinances`),
+- keep non-target exclusions explicit in `NOT_TECH_WORDS`,
+- do not carry terms from a previous technology into new tech configs,
+- write a technology-specific `extraction_system_prompt`.
+
+## Run command
+
+```bash
+pixi run compass process -c config.json5 -p path/to/plugin_config.yaml -v
+```
+
+If running outside the tech folder, use absolute paths for `-c` and `-p`.
+
+## Guardrails
+
+- Retrieval behavior belongs in plugin YAML.
+- Feature logic belongs in schema.
+- Adjust one tuning axis per run for clean attribution.
+- Keep one canonical plugin file per technology in the active example folder.
+
diff --git a/.github/skills/schema-creation/SKILL.md b/.github/skills/schema-creation/SKILL.md
new file mode 100644
index 00000000..08981132
--- /dev/null
+++ b/.github/skills/schema-creation/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: schema-creation
+description: Author and iterate one-shot extraction schemas for native COMPASS. Use whenever a user asks to create, expand, or debug schema feature definitions, value/unit rules, or extraction instructions.
+---
+
+# Schema Creation Skill
+
+**ONE-SHOT EXTRACTION ONLY.** This skill applies only to schema-driven extraction
+(new technology onboarding with JSON schema + plugin YAML). For legacy decision-tree
+extraction (existing solar/wind/small-wind in `compass/extraction/<tech>/`),
+consult COMPASS architecture docs.
+
+Use this skill to define what the LLM extracts and how it formats results.
+The schema is the single most important config file for output quality.
+
+## When to use
+
+- Starting a new one-shot technology extraction (NOT decision-tree legacy extraction).
+- Fixing inconsistent or incorrect extracted values in one-shot extraction.
+- Adding new features to an existing one-shot extraction.
+
+## Do not use
+
+- Retrieval tuning tasks that belong in plugin YAML.
+- Legacy decision-tree extraction parser implementation.
+
+## Expected assistant output
+
+When using this skill, return:
+
+1. The proposed schema diff (or full schema block) for the targeted features.
+2. The rationale for VALUE, UNITS, and IGNORE wording.
+3. A smoke-test check plan for validating the schema change.
+
+## Canonical reference
+
+For complete examples, see the `examples/` directory:
+- `examples/one_shot_schema_extraction/wind_schema.json`
+- `examples/water_rights_demo/one-shot/water_rights_schema.json5`
+
+Each follows the pattern: `<tech>_schema.json` or `<tech>_schema.json5`.
+
+## Required output contract
+
+Every schema must define `outputs` as an array. Each item must require
+exactly these five fields and set `additionalProperties: false`:
+
+```json
+{
+  "type": "object",
+  "required": ["outputs"],
+  "additionalProperties": false,
+  "properties": {
+    "outputs": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["feature", "value", "units", "section", "summary"],
+        "additionalProperties": false,
+        "properties": {
+          "feature": { "type": "string", "enum": ["..."] },
+          "value":   { "anyOf": [{"type": "number"}, {"type": "string"}, {"type": "boolean"}, {"type": "array", "items": {"type": "string"}}, {"type": "null"}] },
+          "units":   { "type": ["string", "null"] },
+          "section": { "type": ["string", "null"] },
+          "summary": { "type": ["string", "null"] }
+        }
+      }
+    }
+  }
+}
+```
+
+These five fields map directly to the output CSV columns. COMPASS adds
+`county`, `state`, `FIPS`, and other metadata columns automatically.
+
+## Build sequence
+
+1. **Define the feature enum** — one stable lowercase ID per siting-relevant
+   requirement. Keep naming consistent across iterations and group IDs by
+   family (setbacks, noise, zoning, permitting).
+2. **Define `value` and `units` rules per feature family** — in each
+   feature's `description`, state the expected value type and accepted unit
+   vocabulary explicitly.
+3. **Add `$definitions`** — group related feature descriptions here to keep
+   the `feature` enum block clean.
+4. **Add `$instructions`** — encode global extraction policy (scope, null
+   handling, one-row-per-feature contract, verbatim quote preference).
+5. **Smoke-test on one jurisdiction** — validate all enum items appear in
+   output and null rows are correctly populated for missing features.
+
+## Feature definition template
+
+Every feature description must answer four questions:
+
+1. **What is this?** One sentence identifying the regulatory concept.
+2. **VALUE rule:** What type is the value and what specific values/ranges are
+   valid?
+3. **UNITS rule:** What unit string is accepted, or `null` if not applicable?
+4. **IGNORE / CLARIFICATION:** What near-miss concepts must NOT match this
+   feature?
+
+Example (abbreviated):
+
+```json
+"structure setback": {
+  "description": "Minimum distance from the generator to an occupied building. VALUE: numerical distance. UNITS: 'feet' or 'meters'. IGNORE: setbacks from property lines or roads — those are separate features."
+}
+```
+
+## Feature family taxonomy
+
+Organize `$definitions` by these families:
+
+| Family | Example features |
+|---|---|
+| Setbacks | `structure setback`, `property line setback`, `road setback` |
+| Noise/Emissions | `noise limit`, `emissions standard`, `vibration limit` |
+| Operational | `hours of operation` |
+| Physical design | `screening requirement`, `enclosure requirement`, `exhaust stack height` |
+| Zoning | `primary use districts`, `conditional use districts`, `prohibited use districts` |
+| Permitting | `permit requirement`, `capacity threshold` |
+| Compliance | `decommissioning` |
+
+## `$instructions` block
+
+Always include a `$instructions` object at the top level with these keys:
+
+```json
+"$instructions": {
+  "scope": "Describe exactly what to extract and what to ignore.",
+  "null_handling": "Output every enum feature. Use null value and null summary when a feature is not found in the document. Do not omit features.",
+  "verbatim_quotes": "In summary fields, prefer verbatim quotes from the source. Enclose in double quotation marks.",
+  "units_discipline": "Do not convert units. Record them exactly as they appear in the document."
+}
+```
+
+## Scope bleed control
+
+When COMPASS retrieves a large land-use code instead of a tech-specific
+ordinance, the LLM may extract off-domain provisions.
+
+Fix order (most powerful first):
+1. `extraction_system_prompt` in plugin YAML — state explicitly what is in
+   scope and what is excluded.
+2. `$instructions.scope` in schema — reinforce with exclusion language.
+3. `heuristic_keywords.NOT_TECH_WORDS` — reject documents upstream.
+
+Do not expand the feature enum to absorb scope bleed. Narrow the prompt.
+
+## Cross-technology adaptation checklist
+
+When cloning a schema for a new technology:
+
+- [ ] Replace all feature IDs with technology-specific names.
+- [ ] Replace value/units rules in every feature description.
+- [ ] Replace exclusion terms in `$instructions.scope` and feature IGNORE
+      clauses.
+- [ ] Replace `$definitions` group names to match new feature families.
+- [ ] Smoke-test before widening to 10+ jurisdictions.
+
+## Quality checklist
+
+- [ ] Feature enum uses stable, consistent IDs across all runs.
+- [ ] Every feature description contains VALUE, UNITS, and IGNORE clauses.
+- [ ] `$instructions` block is present with all five keys.
+- [ ] `additionalProperties: false` is set on the top-level object and on
+      each item in the `outputs` array.
+- [ ] Schema validates cleanly against a JSON Schema validator.
+- [ ] A smoke run using this schema produces extracted rows (not just
+   successful process exit logs).
+
+## Anti-patterns to avoid
+
+- Feature IDs that change names between iterations.
+- Implicit unit assumptions not stated in description text.
+- Missing IGNORE clauses for common near-miss features.
+- Examples in descriptions that contradict field rules.
+- Widening the enum to absorb scope bleed instead of tightening the prompt.