From d4d580b06bed34f4bc7c88b608ff837ad14d97be Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Tue, 17 Mar 2026 22:16:59 +0000
Subject: [PATCH 1/2] Initial plan


From 38f7a724637c5bec2aacaf9277128b028f492a04 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Tue, 17 Mar 2026 22:24:19 +0000
Subject: [PATCH 2/2] Fix all review comments in skills documentation

Co-authored-by: bpulluta <115118857+bpulluta@users.noreply.github.com>
---
 .github/skills/extraction-run/SKILL.md | 44 ++++++++++++++------------
 .github/skills/web-scraper/SKILL.md    | 20 +++++++-----
 .github/skills/yaml-setup/SKILL.md     | 14 +++++---
 3 files changed, 44 insertions(+), 34 deletions(-)
diff --git a/.github/skills/extraction-run/SKILL.md b/.github/skills/extraction-run/SKILL.md
index 00be8f92..a356eea2 100644
--- a/.github/skills/extraction-run/SKILL.md
+++ b/.github/skills/extraction-run/SKILL.md
@@ -56,15 +56,17 @@ only a schema JSON, a plugin YAML, and a run config — no Python source changes
 
 New technology assets start in `examples/` and finish in `compass/extraction/`:
 
-1. **Develop** — place all assets in `examples/one_shot_schema_extraction_<tech>/`
+1. **Develop** — place all assets in `examples/one_shot_schema_extraction/`
 2. **Stabilize** — iterate schema/plugin until smoke and robustness gates pass
 3. **Promote** — copy the three finalized files into `compass/extraction/<tech>/`:
    - `<tech>_schema.json`
    - `<tech>_plugin_config.yaml`
    - `<tech>_config.json5` (optional; useful as a reference run config)
+   - `__init__.py` — registers the plugin via `create_schema_based_one_shot_extraction_plugin`
 
-The promoted extraction folder contains only config files — no Python code is
-needed for one-shot techs.
+   After creating the package, add an import in `compass/extraction/__init__.py`
+   to register the plugin at startup. See `compass/extraction/ghp/__init__.py`
+   for a reference implementation.
 
 ## Required inputs
 
@@ -78,10 +80,10 @@ needed for one-shot techs.
 - Jurisdiction CSV has headers `County,State`.
 - `out_dir` is unique for this run.
 - At least one acquisition step is enabled:
-	`perform_se_search: true`, `perform_website_search: true`,
-	`known_doc_urls`, or `known_local_docs`.
+  `perform_se_search: true`, `perform_website_search: true`,
+  `known_doc_urls`, or `known_local_docs`.
 - If `heuristic_keywords` exists, all four required lists are present and
-	non-empty.
+  non-empty.
 
 ## Naming convention
 
@@ -106,7 +108,7 @@ to deterministic mode only when search infrastructure is unstable:
 2. Use your preferred configured search engine.
 3. Load `.env` into shell (`set -a && source .env && set +a`).
 4. Run with verbose logs:
-	 - `pixi run compass process -c config.json5 -p plugin.yaml -v`
+   - `pixi run compass process -c config.json5 -p plugin.yaml -v`
 5. Confirm output artifacts exist before tuning schema semantics.
 
 Fallback mode when needed:
@@ -144,11 +146,11 @@ pixi run compass process -c config.json5 -p path/to/plugin_config.yaml -v
 ## Phase-gated workflow
 
 1. **Smoke test (1 jurisdiction)**
-	 - Goal: verify wiring and output contract.
+   - Goal: verify wiring and output contract.
 2. **Robustness (5 jurisdictions)**
-	 - Goal: verify feature stability and edge-case handling.
+   - Goal: verify feature stability and edge-case handling.
 3. **Scale (full set)**
-	 - Goal: only after earlier phases pass acceptance gates.
+   - Goal: only after earlier phases pass acceptance gates.
 
 ## Validation checklist
 
@@ -194,7 +196,7 @@ Check in order:
 1. `outputs/*/cleaned_text/*.txt` (text extraction present)
 2. `outputs/*/jurisdiction_dbs/*.csv` (per-jurisdiction parsed rows)
 3. `outputs/*/quantitative_ordinances.csv` and
-	 `outputs/*/qualitative_ordinances.csv` (final compiled results)
+   `outputs/*/qualitative_ordinances.csv` (final compiled results)
 
 Treat the run as **failed for extraction quality** when either is true:
 - `Number of jurisdictions with extracted data: 0`
@@ -207,20 +209,20 @@ Only treat a run as passing when both are true:
 ## Root-cause triage
 
 - **Wrong or noisy documents**
-	- Tune query templates, URL keywords, and exclusions.
-	- Prefer `known_doc_urls` while stabilizing.
+  - Tune query templates, URL keywords, and exclusions.
+  - Prefer `known_doc_urls` while stabilizing.
 - **Right documents, wrong fields**
-	- Tune schema descriptions/examples and ambiguity rules.
-	- Check `extraction_system_prompt` in plugin YAML — it is the primary
-	  guard against scope bleed from generic legal documents.
+  - Tune schema descriptions/examples and ambiguity rules.
+  - Check `extraction_system_prompt` in plugin YAML — it is the primary
+    guard against scope bleed from generic legal documents.
 - **Correct values, unstable formatting**
-	- Tighten enums, unit vocabulary, and null behavior.
+  - Tighten enums, unit vocabulary, and null behavior.
 - **Nothing downloaded / unstable search**
-	- Disable live search and use deterministic known URLs/local docs.
+  - Disable live search and use deterministic known URLs/local docs.
 - **0 documents found for a jurisdiction during website crawl**
-	- Expected for jurisdictions with few online ordinances. The website
-	  crawl is a second acquisition pass after search-engine retrieval;
-	  0 results there is not a pipeline failure.
+  - Expected for jurisdictions with few online ordinances. The website
+    crawl is a second acquisition pass after search-engine retrieval;
+    0 results there is not a pipeline failure.
 
 ## Acceptance gates
 
diff --git a/.github/skills/web-scraper/SKILL.md b/.github/skills/web-scraper/SKILL.md
index 27a3fa37..05a078f0 100644
--- a/.github/skills/web-scraper/SKILL.md
+++ b/.github/skills/web-scraper/SKILL.md
@@ -30,9 +30,12 @@ When using this skill, return:
 
 ## Canonical reference
 
-Consult example plugin configurations in `examples/` following the tech-first naming pattern:
-- `<tech>_plugin_config.yaml` — standard one-shot config
-- See `examples/water_rights_demo/one-shot/plugin_config.yaml` for multi-document edge cases
+Consult example plugin configurations in `examples/`:
+- `examples/one_shot_schema_extraction/plugin_config.yaml` — standard one-shot config
+- `examples/water_rights_demo/one-shot/plugin_config.yaml` — multi-document edge cases
+
+When creating new tech configs, use `<tech>_plugin_config.yaml` as a recommended
+naming convention (e.g. `geothermal_plugin_config.yaml`).
 
 ## Scope
 
@@ -49,7 +52,8 @@ COMPASS runs two sequential acquisition passes per jurisdiction:
    ordinance documents.
 2. **Website crawl phase** — crawls the jurisdiction's official website,
    ranking pages using `website_keywords`. This phase is a secondary pass
-   and runs even if the SE phase found documents.
+   and runs only if the search-engine phase did not yield an ordinance
+   context.
 
 Key behaviors:
 - Playwright browser errors during the website crawl phase are **non-fatal**.
@@ -123,10 +127,10 @@ Then validate:
 3. Run heuristic filter and review false rejects/accepts (`cleaned_text/`).
 4. Check website crawl phase independently if needed (enable, run, inspect logs).
 5. Update one axis only:
-	- query templates (affects SE phase),
-	- URL weights (affects both phases),
-	- include/exclude heuristic patterns (pre-LLM filter),
-  - `NOT_TECH_WORDS` (upstream document rejection).
+   - query templates (affects SE phase),
+   - URL weights (affects both phases),
+   - include/exclude heuristic patterns (pre-LLM filter),
+   - `NOT_TECH_WORDS` (upstream document rejection).
 6. Re-run same sample and compare.
 
 ## Cross-tech onboarding
diff --git a/.github/skills/yaml-setup/SKILL.md b/.github/skills/yaml-setup/SKILL.md
index af2a82e5..1502085c 100644
--- a/.github/skills/yaml-setup/SKILL.md
+++ b/.github/skills/yaml-setup/SKILL.md
@@ -33,10 +33,15 @@ When using this skill, return:
 
 ## Canonical reference
 
-With tech-first naming, configuration examples follow this pattern:
-- `examples/one_shot_schema_extraction/<tech>_plugin_config.yaml` — standard working example
+Consult the working examples in `examples/`:
+- `examples/one_shot_schema_extraction/plugin_config.yaml` — standard working example
 - `examples/water_rights_demo/one-shot/plugin_config.yaml` — multi-doc edge case
 
+When creating new tech configs, `<tech>_plugin_config.yaml` is the recommended
+naming convention (e.g. `geothermal_plugin_config.yaml`). The existing
+`plugin_config.yaml` examples use a generic name; new tech-specific assets
+should use the tech-first naming pattern.
+
 Refer to any complete example in `examples/` that matches your retrieval goals.
 
 ## Naming convention
@@ -78,7 +83,7 @@ schema: ./my_schema.json
 | `collection_prompts` | list or `true` | Text collection prompt(s). If **`true`**, LLM auto-generates from schema. |
 | `text_extraction_prompts` | list or `true` | Text consolidation prompt(s). If **`true`**, LLM auto-generates from schema. |
 | `extraction_system_prompt` | string | Overrides default LLM system prompt for the extraction step. Use this to scope extraction tightly to the target technology. |
-| `cache_llm_generated_content` | bool | Cache LLM-generated `query_templates` and `website_keywords`. Set to `false` when iterating schema to see live changes. |
+| `cache_llm_generated_content` | bool | Cache LLM-generated `query_templates`, `website_keywords`, and `heuristic_keywords`. Set to `false` when iterating schema to see live changes. |
 
 ## Required `heuristic_keywords` shape
 
@@ -122,8 +127,7 @@ extraction_system_prompt: |-
   Prefer explicit values. Use null for qualitative obligations.
 ```
 
-See `compass/extraction/geothermal_electricity/geothermal_plugin_config.yaml`
-for a complete example.
+See `compass/extraction/ghp/plugin_config.yaml` for a complete example.
 
 ## Progressive config path