Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

Fix PR#14: Align sweep configs and docs with actual tool output schemas#15

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/fix-pr14-issues
Draft

Fix PR#14: Align sweep configs and docs with actual tool output schemas#15
Copilot wants to merge 4 commits intomainfrom
copilot/fix-pr14-issues

Conversation

Copy link
Contributor

Copilot AI commented Feb 9, 2026

PR#14 introduced sweep configurations with incorrect field paths and documentation that didn't match actual tool outputs. This would cause null extractions and user confusion.

Changes

Config schema alignment

  • sweep_threshold_example.json: Fixed metrics paths from ["map50", "mar_100"] to ["metrics.map50", "metrics.ar100"] (eval_coco.py nests under metrics.*)
  • sweep_gate_weights_example.json: Fixed paths from ["best_weights.det", "best_score"] to ["metrics.tuning.best.det", "metrics.tuning.best.map50_95"] (tune_gate_weights.py structure)
  • sweep_gate_weights_example.json: Added grid_det to param_order to prevent run_id collisions

Documentation corrections

  • sweep_examples.md: Updated threshold and gate weight sections to reflect actual JSON output structures
  • sweep_examples.md: Fixed markdown table syntax (|||)
  • SECURITY_SUMMARY_P2.md: Corrected environment handling note (hpo_sweep.py inherits caller's env and overlays config, not isolated)

Test infrastructure

  • test_sweep_configs.py: Converted to proper unittest.TestCase so CI can discover/run tests via python -m unittest

All paths verified against actual tool implementations (eval_coco.py, tune_gate_weights.py).

Original prompt

fix PR#14 and make new PR


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits February 9, 2026 23:53
- Fix SECURITY_SUMMARY_P2.md env security note
- Fix sweep_threshold_example.json metrics.keys paths
- Fix sweep_gate_weights_example.json param_order and metrics.keys
- Fix sweep_examples.md docs to match actual output schemas
- Convert test_sweep_configs.py to proper unittest.TestCase

Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Replace non-existent 'score' field with 'map50_95' to match actual tune_gate_weights.py output

Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Simplify parenthetical to focus on key distinction

Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
@thinksyncs thinksyncs requested a review from Copilot February 9, 2026 23:57
Copilot AI changed the title [WIP] Fix issues in PR#14 and create new pull request Fix PR#14: Align sweep configs and docs with actual tool output schemas Feb 9, 2026
Copilot AI requested a review from thinksyncs February 9, 2026 23:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds/updates sweep example configs and documentation, plus a CI-friendly unittest to validate the example JSON sweep schemas (addressing prior PR#14 review feedback).

Changes:

  • Added a unittest-based validator for the sweep example JSON configs.
  • Added JSON sweep configuration examples for TTT, threshold sweeps, and gate-weight tuning.
  • Added/updated sweep documentation (including metric key paths) and a P2 security summary write-up.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_sweep_configs.py New unittest to validate sweep config example JSON structure.
docs/sweep_ttt_example.json New example sweep config for TTT parameter exploration.
docs/sweep_threshold_example.json New example sweep config for threshold tuning + COCO eval metric extraction.
docs/sweep_gate_weights_example.json New example sweep config for gate-weight tuning report metric extraction.
docs/sweep_examples.md New/updated documentation explaining sweep configs and usage patterns.
SECURITY_SUMMARY_P2.md New security summary documenting checks and security considerations for the sweep expansion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Validates JSON structure and parameter combinations.
"""
import json
import sys
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys is imported but never used, which triggers Ruff F401 and will fail CI (ruff selects F). Remove the unused import (or use it if needed).

Suggested change
import sys

Copilot uses AI. Check for mistakes.
# Calculate total runs
total_runs = 1
for values in grid.values():
total_runs *= len(values)
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_runs is computed but never used, which triggers Ruff F841 and will fail CI. Either remove this variable or assert against an expected run count so the value is actually used.

Suggested change
total_runs *= len(values)
total_runs *= len(values)
self.assertGreaterEqual(
total_runs,
1,
f"Total runs must be at least 1 in {config_path.name}",
)

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +246
| Sweep Type | Config File | Typical Runs | Outputs | Use Case |
|------------|-------------|--------------|---------|----------|
| TTT | `sweep_ttt_example.json` | 64 | `sweep_ttt.{jsonl,csv,md}` | Find best TTT hyperparams |
| Threshold | `sweep_threshold_example.json` | 8 | `sweep_threshold.{jsonl,csv,md}` | Find optimal score cutoff |
| Gate Weights | `sweep_gate_weights_example.json` | 8 | `sweep_gate_weights.{jsonl,csv,md}` | Tune inference-time score fusion |
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary table rows start with ||, which breaks standard Markdown table formatting (it creates an extra empty column in most renderers). Use a single leading | for each row and keep the header/separator row column counts consistent.

Copilot uses AI. Check for mistakes.
The test validation script:
- Only reads JSON files from known locations
- Uses safe JSON parsing (`json.loads`)
- No file write operations except when run with output flags
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says the test script has "output flags" and may write files, but tests/test_sweep_configs.py only reads JSON files and doesn't implement any output/write flags. Please update this wording to reflect the actual behavior so the security summary doesn't contain inaccurate claims.

Suggested change
- No file write operations except when run with output flags
- No file write operations (read-only JSON validation)

Copilot uses AI. Check for mistakes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants