-
Notifications
You must be signed in to change notification settings - Fork 3
Add One-Shot Skills Reliability and Guardrails #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d9e868d
Add COMPASS workflow skills
bpulluta 54b8d29
Added one-shot skills
bpulluta a71447f
update one-shot SKILL.md structure and trigger contracts
bpulluta 74495a6
Initial plan (#398)
Copilot 81fcbff
Fix skills documentation: correct paths, caching behavior, and tab fo…
Copilot 1b8571f
renamed skills and fixed minor comments
bpulluta 3288e26
udpated skills Paul review march 26
bpulluta File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| --- | ||
| name: document-retrieval | ||
| description: Build and tune retrieval configs that search, rank, and collect ordinance documents in COMPASS. Use whenever a user asks to improve retrieval precision/recall, tune search queries/keywords, or debug acquisition quality before extraction tuning. | ||
| --- | ||
|
|
||
| # Web Scraper Skill | ||
|
|
||
| Use this skill to improve retrieval precision/recall before extraction tuning. | ||
| Applies to both one-shot (schema-driven) and legacy decision-tree extraction | ||
| pipelines. | ||
|
|
||
| ## When to use | ||
|
|
||
| - Download step returns noisy sources (one-shot extraction). | ||
| - Ordinance recall is weak across jurisdictions (one-shot extraction). | ||
| - LLM filtering is compensating for poor search quality. | ||
|
|
||
| ## Do not use | ||
|
|
||
| - Schema feature definition or value extraction logic design. | ||
| - Post-extraction feature/value debugging when retrieval is already correct. | ||
|
|
||
| ## Expected assistant output | ||
|
|
||
| When using this skill, return: | ||
|
|
||
| 1. The retrieval axis changed (queries, keyword weights, or heuristics). | ||
| 2. Evidence from artifacts/logs showing why the change was needed. | ||
| 3. The next run command against the same jurisdiction sample. | ||
|
|
||
| ## Canonical reference | ||
|
|
||
| Consult example plugin configurations in `examples/`: | ||
| - `examples/one_shot_schema_extraction/plugin_config.yaml` — standard one-shot config | ||
| - `examples/water_rights_demo/one-shot/plugin_config.yaml` — multi-document edge cases | ||
|
|
||
| When creating new tech configs, use `<tech>_plugin_config.yaml` as a recommended | ||
| naming convention (e.g. `geothermal_plugin_config.yaml`). | ||
|
|
||
| ## Scope | ||
|
|
||
| - Query-template strategy. | ||
| - URL ranking and filtering patterns. | ||
| - Heuristic phrase controls before LLM validation. | ||
|
|
||
| ## Two retrieval phases | ||
|
|
||
| COMPASS runs two sequential acquisition passes per jurisdiction: | ||
|
|
||
| 1. **Search-engine phase** — queries `SerpAPIGoogleSearch` (or configured | ||
| engine) using `query_templates`. This phase is the primary source of | ||
| ordinance documents. | ||
| 2. **Website crawl phase** — crawls the jurisdiction's official website, | ||
| ranking pages using `website_keywords`. This phase is a secondary pass | ||
| and runs only if the search-engine phase did not yield an ordinance | ||
| context. | ||
|
|
||
| Key behaviors: | ||
| - Playwright browser errors during the website crawl phase are **non-fatal**. | ||
| COMPASS logs the error and continues. | ||
| - `Found 0 potential documents` at the end of the crawl phase is **expected** | ||
| for jurisdictions without relevant online ordinances. | ||
| - Disable the crawl phase with `perform_website_search: false` in run config | ||
| when you want faster smoke tests or Playwright is unavailable. | ||
|
|
||
| ## Key management | ||
|
|
||
| For SerpAPI-backed search, keep `api_key` out of committed config and provide | ||
| `SERPAPI_KEY` via environment (for example through `.env` loaded in shell). | ||
|
|
||
| Recommended shell setup: | ||
|
|
||
| ```bash | ||
| set -a | ||
| source .env | ||
| set +a | ||
| ``` | ||
|
|
||
| Avoid spaces around `=` in `.env` assignments. | ||
|
|
||
| ## Retrieval design pattern | ||
|
|
||
| 1. Create 3-7 jurisdiction queries with `{jurisdiction}`. | ||
| 2. Weight legal document indicators in URL keywords. | ||
| 3. Apply exclusions for templates/reports/slides. | ||
| 4. Add focused negative tech terms to reduce false positives. | ||
| 5. Start with dynamic search, then switch to deterministic known URLs when | ||
| search infrastructure is unstable. | ||
|
|
||
| When using `heuristic_keywords`, use these four lists to guide pre-LLM filtering: | ||
| - `GOOD_TECH_KEYWORDS` — strong indicators of the target technology | ||
| (e.g., facility types, deployment modes). Documents matching even a | ||
| few keywords are marked as candidates. | ||
| - `GOOD_TECH_PHRASES` — multi-word phrases that signal relevant | ||
| ordinance content. Keep specific to avoid false positives. | ||
| - `GOOD_TECH_ACRONYMS` — industry-standard abbreviations for the | ||
| technology. Narrow list; include only widely recognized acronyms. | ||
| - `NOT_TECH_WORDS` — pre-heuristic filter that rejects documents | ||
| before keyword matching. Use to exclude adjacent technologies and | ||
| irrelevant domains (e.g., residential HVAC, unrelated industries). | ||
| Runs first; prevents wasted keyword evaluation on clearly-wrong | ||
| documents. | ||
|
|
||
| If any required list is missing or empty, COMPASS raises a plugin | ||
| configuration error and extraction quality should be treated as failed. | ||
|
|
||
| For first-pass reliability, test retrieval with deterministic known URLs | ||
| before using live web search. | ||
|
|
||
| ## Technology-specific retrieval controls (template) | ||
|
|
||
| - Include target-technology facility/deployment terms. | ||
| - Exclude adjacent and non-target terms (residential/HVAC/PV/etc as needed). | ||
| - Favor jurisdictional legal-code signals like `land use code`, | ||
| `code of ordinances`, `use table`, and `special use permit`. | ||
|
|
||
| ## Deterministic smoke-test mode | ||
| For this smoke test, at least one of the following documentation sources must be provided: | ||
|
|
||
| - **`known_doc_urls`**: A list of URLs pointing to external documentation that the scraper can access and parse | ||
| - **`known_local_docs`**: A collection of local documentation files available in the repository or system | ||
|
|
||
| Use run-config controls to bypass flaky search while tuning: | ||
|
|
||
| - supply `known_doc_urls` or `known_local_docs`, | ||
| - set `perform_se_search: false`, | ||
| - set `perform_website_search: false`. | ||
|
|
||
| Then validate: | ||
|
|
||
| - download artifacts exist, | ||
| - cleaned text exists, | ||
| - ordinance DB rows are non-empty. | ||
|
|
||
| ## Tuning loop | ||
|
|
||
| 1. Run SE-search phase on small sample. | ||
| 2. Inspect kept vs discarded PDFs (`ordinance_files/`). | ||
| 3. Run heuristic filter and review false rejects/accepts (`cleaned_text/`). | ||
| 4. Check website crawl phase independently if needed (enable, run, inspect logs). | ||
| 5. Update one axis only: | ||
| - query templates (affects SE phase), | ||
| - URL weights (affects both phases), | ||
| - include/exclude heuristic patterns (pre-LLM filter), | ||
| - `NOT_TECH_WORDS` (upstream document rejection). | ||
| 6. Re-run same sample and compare. | ||
|
|
||
| ## Cross-tech onboarding | ||
|
|
||
| When reusing this workflow for any technology: | ||
|
|
||
| - keep legal retrieval tokens (`ordinance`, `zoning`, `code`), | ||
| - replace all technology terms in `query_templates`, `website_keywords`, | ||
| and `heuristic_keywords`, | ||
| - seed `known_doc_urls` with authoritative regulatory documents for smoke | ||
| testing, | ||
| - avoid copying negatives from previous technologies into the new tech config, | ||
| - verify `NOT_TECH_WORDS` excludes adjacent technologies for your domain. | ||
|
|
||
| ## Phase gates | ||
|
|
||
| - **3 jurisdictions**: ensure major source classes are found. | ||
| - **10 jurisdictions**: verify stability across regions. | ||
|
|
||
|
|
||
| ## Guardrails | ||
|
|
||
| - Keep feature extraction logic out of retrieval config. | ||
| - Do not overfit to one county's document style. | ||
| - Preserve auditable rationale for each retrieval change. | ||
| - Keep one canonical retrieval config per active technology. | ||
| - Ensure each run uses a unique `out_dir` to avoid COMPASS aborting early. | ||
|
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.