Conversation
Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Fix ruff linting errors in CI
# Conflicts: # tests/test_cuda_smoke_rtdetr_pose.py
# Conflicts: # tests/test_export_predictions_lora_cli.py # tests/test_refine_predictions_hessian_cli.py # tools/export_predictions.py # yolozu/adapter.py
# Conflicts: # yolozu/adapter.py
There was a problem hiding this comment.
Pull request overview
This PR expands YOLOZU with continual-learning helpers, richer prediction/dataset validation utilities, and additional CLI + tooling for evaluation/export workflows (incl. segmentation/keypoints and TRT parity), backed by a large set of new unit tests and schemas.
Changes:
- Added new normalization/parsing utilities (keypoints, intrinsics, instance-seg predictions) and dataset validator helpers.
- Extended tooling/CLIs for training/testing, baseline reporting, dataset preparation, and export/parity workflows.
- Added extensive test coverage plus JSON schemas and documentation updates for the new contracts.
Reviewed changes
Copilot reviewed 169 out of 202 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| yolozu/keypoints.py | Adds keypoints normalization/inference + pixel conversion helpers. |
| yolozu/intrinsics.py | Adds robust intrinsics parsing from multiple OpenCV-friendly representations. |
| yolozu/instance_segmentation_predictions.py | Adds instance-seg predictions normalization + validation + loader helpers. |
| yolozu/inference_utils.py | Adds a convenience wrapper to apply constraints + template verification in one step. |
| yolozu/gates.py | Adds low-FP gate function alongside existing template gate. |
| yolozu/datasets/ade20k.py | Adds ADE20K dataset path resolution, sample iteration, and optional class-name loading. |
| yolozu/dataset_validator.py | Adds dataset record validation with strict/warn modes and optional image checks. |
| yolozu/continual_metrics.py | Adds continual-learning summary metrics computation for task×time matrices. |
| yolozu/cli.py | Adds python -m yolozu CLI with train/test subcommands reading YAML/JSON configs. |
| yolozu/calibration/init.py | Exposes calibration/refinement API from calibration subpackage. |
| yolozu/adapter.py | Improves preprocessing: NumPy-based tensor conversion + preprocess metadata + intrinsics scaling. |
| yolozu/main.py | Adds python -m yolozu entrypoint. |
| train_setting.yaml | Adds default training config template for the new CLI. |
| tools/yolozu | Adds shell wrapper to run module CLI. |
| tools/validate_segmentation_predictions.py | Adds CLI tool to validate segmentation predictions shapes/contracts. |
| tools/validate_predictions.py | Updates validator tool to normalize wrapped payloads and validate meta. |
| tools/validate_instance_segmentation_predictions.py | Adds CLI tool to validate instance-seg predictions shapes/contracts. |
| tools/validate_dataset.py | Adds CLI tool to validate YOLO dataset manifests/records. |
| tools/run_scenario_suite.py | Adds CLI tool to generate a scenario-suite report JSON. |
| tools/run_baseline.py | Expands baseline report schema (meta/speed/hash) and adds optional COCO mAP and scenario disabling. |
| tools/prepare_voc_seg.py | Adds Pascal VOC segmentation dataset preparation tool (manifest/symlink/copy). |
| tools/prepare_ade20k_seg.py | Adds ADE20K segmentation dataset preparation tool (manifest/symlink/copy). |
| tools/plot_metrics.py | Adds simple JSONL metrics plotting utility. |
| tools/fetch_coco128_official.py | Adds optional --insecure download mode and propagates it through fetch helpers. |
| tools/fetch_coco128.sh | Adds env/CI-driven --insecure toggle when fetching coco128. |
| tools/export_predictions_trt.py | Refactors TRT output resolution helper; adds stronger typing for predictions. |
| tools/export_predictions.py | Switches to shared TTT preset application and improves rollback behavior for norm buffers. |
| tools/calibrate_predictions.py | Adds CLI tool to calibrate predictions via L-BFGS and optionally wrap outputs with meta. |
| tools/build_trt_engine.py | Adds helper to resolve TRT input name for compatibility/tests. |
| tools/benchmark_latency.py | Adds FPS/latency targets and non-zero exit for failed target thresholds. |
| todo_yolo26_competition.md | Updates checklist items to completed where features landed. |
| todo_pytorch_trt.md | Updates checklist items to completed where features landed. |
| tests/test_yolozu_cli.py | Adds CLI help assertions + instance-seg demo eval regression test. |
| tests/test_validate_segmentation_predictions_tool.py | Adds tests for segmentation-predictions validator tool inputs. |
| tests/test_validate_instance_segmentation_predictions_tool.py | Adds tests for instance-seg validator tool inputs. |
| tests/test_ttt_presets.py | Adds unit tests for TTT preset auto-application and safety guard filling. |
| tests/test_ttt_integration.py | Adds regression test ensuring rollback restores BN buffers when enabled. |
| tests/test_tta.py | Adds TTA schema/determinism tests and norm-only update behavior verification. |
| tests/test_template_verification.py | Adds minimal test ensuring top-k template verification annotates only selected detections. |
| tests/test_scenario_suite.py | Strengthens scenario-suite report schema expectations. |
| tests/test_run_trt_workflow.py | Adds tests for TRT workflow command construction. |
| tests/test_run_rtdetr_pose_backend_suite_cli.py | Adds smoke test for backend suite dry-run artifact generation. |
| tests/test_run_baseline_report.py | Adds smoke test for unified baseline report schema. |
| tests/test_rtdetr_pose_adapter.py | Adds preprocess determinism/range tests and intrinsics scaling test. |
| tests/test_report_dependency_licenses_tool.py | Adds smoke test for dependency license report tool. |
| tests/test_replay_buffer.py | Adds reservoir replay-buffer behavior tests. |
| tests/test_refine_predictions_hessian_cli.py | Updates refine CLI tests; adds bbox fields in test payload. |
| tests/test_prepare_voc_seg_tool.py | Adds tests for VOC seg prep tool output structure. |
| tests/test_prepare_ade20k_seg_tool.py | Adds tests for ADE20K seg prep tool output structure. |
| tests/test_predictions.py | Adds tests for wrapped payload loading + meta validation contract. |
| tests/test_pose_eval.py | Adds pose evaluation denominator behavior regression test. |
| tests/test_parity_trt_tool.py | Adds tests for parity tool CLI arg building helper. |
| tests/test_onnx_parity_rtdetr_pose.py | Adds PyTorch↔ONNXRuntime parity test for RTDETRPose outputs. |
| tests/test_make_subset_dataset.py | Adds determinism test for subset dataset tool and hash output. |
| tests/test_intrinsics.py | Adds tests for intrinsics parsing supported formats. |
| tests/test_inference_utils.py | Adds test ensuring constraints + template verification integrate correctly. |
| tests/test_inference_constraints.py | Adds tests for constraints inference including missing intrinsics path. |
| tests/test_gates_constraints.py | Adds tests for low-FP gate behavior. |
| tests/test_export_predictions_lora_cli.py | Minor cleanup in LoRA export test file (whitespace). |
| tests/test_eval_segmentation_tool.py | Adds segmentation eval tool smoke test with ignore-index + HTML overlays. |
| tests/test_eval_keypoints_tool.py | Adds keypoints eval tool smoke test for PCK output. |
| tests/test_eval_instance_segmentation_tool.py | Adds instance-seg eval tool smoke test with HTML overlays. |
| tests/test_dataset_validator.py | Adds tests for strict vs warn dataset validation behavior. |
| tests/test_dataset_keypoints.py | Adds tests for parsing YOLO pose keypoints formats from labels. |
| tests/test_continual_metrics.py | Adds tests for continual-metrics summary computation. |
| tests/test_coco_keypoints_eval.py | Adds tests for keypoints COCO GT conversion + OKS eval dependency behavior. |
| tests/test_check_keypoints_parity_tool.py | Adds tests for keypoints parity tool success/failure thresholds. |
| tests/test_calibrate_predictions_lbfgs.py | Adds test ensuring LBFGS calibration recovers depth scale factor. |
| tests/test_build_trt_engine.py | Adds tests for TRT input-name resolution helper. |
| tests/test_benchmark_keypoints_eval_tool.py | Adds benchmark tool smoke test for keypoints evaluation. |
| test_setting.yaml | Adds default test config template for the new CLI. |
| schemas/predictions.schema.json | Adds JSON Schema for predictions contract (array/wrapper/mapping shapes). |
| rtdetr_pose/tests/test_train_minimal_stage_schedule.py | Adds tests for stage-weight schedule in training. |
| rtdetr_pose/tests/test_train_minimal_mim_teacher.py | Adds tests for MIM teacher fields in collate. |
| rtdetr_pose/tests/test_train_minimal_mim_schedule.py | Adds tests for MIM schedule interpolation behavior. |
| rtdetr_pose/tests/test_train_minimal_mim_mask.py | Adds tests that MIM mask can fully mask patches. |
| rtdetr_pose/tests/test_train_minimal_mask_labels.py | Adds tests deriving labels from instance masks. |
| rtdetr_pose/tests/test_train_minimal_integration.py | Adds integration tests for grad-accum, AMP behavior, combined features. |
| rtdetr_pose/tests/test_train_minimal_grad_accum_amp.py | Adds parser tests for grad-accum and AMP flags. |
| rtdetr_pose/tests/test_train_minimal_denoise.py | Adds test ensuring denoise targets append correctly. |
| rtdetr_pose/tests/test_train_minimal_cost_schedule.py | Adds tests for staged matcher-cost schedule. |
| rtdetr_pose/tests/test_model_backbone_sppf.py | Adds tests for SPPF toggle in backbone. |
| rtdetr_pose/tests/test_losses_masking.py | Adds test for z-loss masking based on depth/mask availability. |
| rtdetr_pose/tests/test_lora.py | Adds LoRA application/freezing/coverage tests. |
| rtdetr_pose/tests/test_hybrid_encoder_level_embed.py | Adds tests for encoder level embedding toggle and forward shapes. |
| rtdetr_pose/tests/test_dataset_extract_pose_intrinsics.py | Adds tests for OpenCV camera_matrix dict parsing in dataset targets. |
| rtdetr_pose/tests/test_dataset.py | Adds skip when coco128 is missing to avoid hard failure. |
| rtdetr_pose/rtdetr_pose/validator.py | Extends array loader to support .png masks as grayscale arrays. |
| rtdetr_pose/rtdetr_pose/losses.py | Adds MIM reconstruction loss + entropy loss and integrates into total loss. |
| rtdetr_pose/rtdetr_pose/config.py | Adds model config flags for SPPF and level embeddings. |
| rtdetr_pose/configs/base.json | Enables new model config flags in default config JSON. |
| rtdetr_pose/README.md | Documents continual runner and LoRA usage plus mask-only label notes. |
| requirements-test.txt | Adds ONNXRuntime and onnxscript for parity tests. |
| pyproject.toml | Adds Ruff config and per-file ignores for E402 in tools/tests. |
| examples/instance_seg_demo/predictions/instance_seg_predictions_rgbmask.json | Adds demo predictions including RGB mask case for evaluator option. |
| examples/instance_seg_demo/predictions/instance_seg_predictions_noisy.json | Adds demo predictions with low-score FPs for threshold demo. |
| examples/instance_seg_demo/predictions/instance_seg_predictions.json | Adds base instance-seg demo predictions. |
| examples/instance_seg_demo/dataset/labels/val2017/demo_001.json | Adds sidecar GT mask paths + classes for demo sample. |
| examples/instance_seg_demo/dataset/labels/val2017/demo_000.json | Adds sidecar GT mask paths + classes for demo sample. |
| examples/instance_seg_demo/classes.txt | Adds class list for instance-seg demo. |
| examples/instance_seg_demo/README.md | Adds runnable documentation for instance-seg demo + evaluator options. |
| docs/yolozu_spec.md | Adds feature summary spec doc for repo capabilities and contracts. |
| docs/yolo26_baseline_repro.md | Documents TRT engine build steps for reproducible baseline flow. |
| docs/tools_index.md | Extends tool index with keypoints/instance-seg eval and policy helpers. |
| docs/tensorrt_pipeline.md | Adds backend suite command to the TRT pipeline docs. |
| docs/schemas/segmentation_predictions.schema.json | Adds schema for segmentation predictions contract (wrapper/list/mapping). |
| docs/schemas/seg_eval_report.schema.json | Adds schema for segmentation eval report output. |
| docs/schemas/seg_dataset.schema.json | Adds schema for segmentation dataset descriptor outputs. |
| docs/schemas/instance_segmentation_predictions.schema.json | Adds schema for instance-seg predictions contract. |
| docs/schemas/instance_seg_eval_report.schema.json | Adds schema for instance-seg eval report output. |
| docs/real_model_interface.md | Updates training command examples and adds units/intrinsics guidance. |
| docs/predictions_schema.md | Adds v1 predictions schema doc including units/intrinsics rules. |
| docs/onnx_export_parity.md | Extends parity docs with TRT parity/workflow helper usage. |
| docs/license_policy.md | Adds dependency license reporting guidance. |
| docs/continual_learning.md | Adds continual learning guide (memoryless/replay/LoRA) and evaluation flow. |
| docs/adapter_contract.md | Adds adapter contract doc including optional TTT hooks. |
| deploy/runpod/run_compliance.sh | Adds script for generating doctor + dependency licenses artifacts in container. |
| deploy/runpod/README.md | Updates container tag naming and documents compliance artifact generation. |
| configs/continual/rtdetr_pose_domain_inc_example.yaml | Adds example continual config (domain incremental) with replay+distill options. |
| SECURITY_SUMMARY.md | Adds security scan summary document for consolidation PR. |
| PR12_FIXES_SUMMARY.md | Adds summary of fixed issues from PR#12 review. |
| NOTICE | Adds NOTICE file for project attribution. |
| LINT_FIXES_NEEDED.md | Adds lint-fix tracking doc for open PRs. |
| .github/workflows/ci.yml | Adds scenario smoke run step to CI. |
| .beads/config.yaml | Updates beads config to include multi-repo paths. |
| .beads/.gitignore | Updates beads ignore patterns and internal staging ignores. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return data or {} | ||
| except Exception: | ||
| return simple_yaml_load(text) | ||
| if path.suffix.lower() == ".json": | ||
| return json.loads(text) | ||
| try: | ||
| return json.loads(text) | ||
| except Exception: | ||
| return simple_yaml_load(text) | ||
|
|
||
|
|
There was a problem hiding this comment.
_load_config() is annotated/used as returning a dict, but yaml.safe_load() / json.loads() can legally return non-dict top-level values (e.g., a list). That will break _build_args_from_config(cfg) which assumes cfg.items(). Fix by validating data is a dict (else raise a clear SystemExit/ValueError), and ensure all branches return a dict.
| return data or {} | |
| except Exception: | |
| return simple_yaml_load(text) | |
| if path.suffix.lower() == ".json": | |
| return json.loads(text) | |
| try: | |
| return json.loads(text) | |
| except Exception: | |
| return simple_yaml_load(text) | |
| except Exception: | |
| data = simple_yaml_load(text) | |
| elif path.suffix.lower() == ".json": | |
| data = json.loads(text) | |
| else: | |
| try: | |
| data = json.loads(text) | |
| except Exception: | |
| data = simple_yaml_load(text) | |
| if data is None: | |
| return {} | |
| if not isinstance(data, dict): | |
| raise SystemExit( | |
| f"Top-level configuration in {path} must be a mapping/object, " | |
| f"got {type(data).__name__} instead." | |
| ) | |
| return data |
| if len(nums) % 3 == 0: | ||
| out = [] | ||
| for i in range(0, len(nums), 3): | ||
| out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]}) | ||
| return out | ||
| if len(nums) % 2 == 0: | ||
| out = [] | ||
| for i in range(0, len(nums), 2): |
There was a problem hiding this comment.
Flat-list parsing is ambiguous when len(nums) is divisible by both 2 and 3 (e.g., 6, 12): the function will always choose the % 3 path and interpret values as (x,y,v) triples even if the caller intended (x,y) pairs. Consider disambiguating by (a) preferring % 2 unless a has_visibility/stride parameter is explicitly provided, or (b) using a heuristic (e.g., all v values are in an expected range like {0,1,2}) before selecting the triple interpretation.
| if len(nums) % 3 == 0: | |
| out = [] | |
| for i in range(0, len(nums), 3): | |
| out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]}) | |
| return out | |
| if len(nums) % 2 == 0: | |
| out = [] | |
| for i in range(0, len(nums), 2): | |
| def _looks_like_visibility_triplets(nums_list: list[float]) -> bool: | |
| """Heuristic: treat as (x, y, v) only if all v are typical visibility flags.""" | |
| if len(nums_list) < 3: | |
| return False | |
| vs = nums_list[2::3] | |
| # Common visibility flags are 0, 1, or 2 (COCO/YOLO-style). | |
| allowed = {0.0, 1.0, 2.0} | |
| return all(v in allowed for v in vs) | |
| n = len(nums) | |
| # Prefer triples when length is only divisible by 3, or when ambiguous but v-values look like visibilities. | |
| if n % 3 == 0 and (n % 2 != 0 or _looks_like_visibility_triplets(nums)): | |
| out = [] | |
| for i in range(0, n, 3): | |
| out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]}) | |
| return out | |
| if n % 2 == 0: | |
| out = [] | |
| for i in range(0, n, 2): |
|
|
||
|
|
||
| def _is_number(value: Any) -> bool: | ||
| return isinstance(value, (int, float)) and not isinstance(value, bool) |
There was a problem hiding this comment.
_is_number() rejects common numeric scalar types like numpy.float32 / numpy.int64 (and other numbers.Real implementers). That makes normalize_keypoints() unnecessarily brittle for data coming from NumPy/Torch conversions. Consider switching to isinstance(value, numbers.Real) (still excluding bool) to accept non-builtin numeric scalars without adding a NumPy dependency.
| def _is_number(value: Any) -> bool: | |
| return isinstance(value, (int, float)) and not isinstance(value, bool) | |
| import numbers | |
| def _is_number(value: Any) -> bool: | |
| return isinstance(value, numbers.Real) and not isinstance(value, bool) |
| if mask is not None: | ||
| # Apply mask: compute loss only on masked locations | ||
| mask_expanded = mask.unsqueeze(0).unsqueeze(0) if mask.ndim == 2 else mask.unsqueeze(1) | ||
| mask_expanded = mask_expanded.expand_as(diff).to(dtype=torch.bool) | ||
|
|
||
| if not mask_expanded.any(): | ||
| return diff.sum() * 0.0 | ||
|
|
||
| return diff[mask_expanded].mean() |
There was a problem hiding this comment.
When mask is on a different device than diff (e.g., CPU mask with CUDA features), diff[mask_expanded] will raise a device mismatch error. Fix by moving the expanded mask to diff.device (and converting to bool) before indexing.
| "bbox": {"cx": 0.5, "cy": 0.5, "w": 0.2, "h": 0.2}, | ||
| # Target a depth value to force offsets refinement along +x (depth(u)=u). | ||
| "log_z": math.log(4.0), | ||
| "rot6d": [1.0, 0.0, 0.0, 0.0, 1.0, 0.0], | ||
| "offsets": [0.0, 0.0], |
There was a problem hiding this comment.
This test hunk appears to introduce duplicated keys in the same detection dict (log_z, rot6d, offsets were already present earlier in the same literal). In Python, later duplicates silently override earlier ones, which can hide test intent and make the payload inconsistent. Remove the duplicated entries (keep a single definition of each key) so the fixture is unambiguous.
| "bbox": {"cx": 0.5, "cy": 0.5, "w": 0.2, "h": 0.2}, | |
| # Target a depth value to force offsets refinement along +x (depth(u)=u). | |
| "log_z": math.log(4.0), | |
| "rot6d": [1.0, 0.0, 0.0, 0.0, 1.0, 0.0], | |
| "offsets": [0.0, 0.0], |
| INSECURE_FLAG="" | ||
| if [[ "${YOLOZU_INSECURE_SSL:-}" == "1" || "${CI:-}" == "true" ]]; then | ||
| INSECURE_FLAG="--insecure" | ||
| fi | ||
|
|
||
| python3 "$REPO_ROOT/tools/fetch_coco128_official.py" --out "$OUT_DIR" $INSECURE_FLAG |
There was a problem hiding this comment.
This enables --insecure (disables SSL verification) automatically when CI=true. That weakens transport security by default in CI and could mask real certificate/URL issues. Consider requiring an explicit opt-in (only YOLOZU_INSECURE_SSL=1) and/or printing a prominent warning when insecure mode is enabled.
| run: | | ||
| python -m unittest | ||
|
|
||
| - name: Scenario smoke run |
There was a problem hiding this comment.
CI now depends on a network download of coco128 at runtime, which can introduce flakiness, slowdowns, and failures in restricted environments. Consider caching the dataset artifact, vendoring a tiny deterministic test dataset, or making this step conditional (e.g., scheduled/nightly or continue-on-error) so core CI remains reliable.
| - name: Scenario smoke run | |
| - name: Scenario smoke run | |
| continue-on-error: true |
| if not enabled: | ||
| return True | ||
| return score_tmp_sym >= tau | ||
|
|
||
|
|
||
| def passes_low_fp_gate(score_tmp_sym, enabled, tau): | ||
| if not enabled: | ||
| return True | ||
| return score_tmp_sym >= tau |
There was a problem hiding this comment.
passes_low_fp_gate() is currently identical to passes_template_gate(), which duplicates logic and increases maintenance cost. Consider implementing it as an alias/wrapper (or a single shared gate helper) and add a short docstring clarifying the semantic difference (template-vs-low-FP) if both names are kept for readability.
| if not enabled: | |
| return True | |
| return score_tmp_sym >= tau | |
| def passes_low_fp_gate(score_tmp_sym, enabled, tau): | |
| if not enabled: | |
| return True | |
| return score_tmp_sym >= tau | |
| """Return True if the template-based score passes the configured threshold gate.""" | |
| if not enabled: | |
| return True | |
| return score_tmp_sym >= tau | |
| def passes_low_fp_gate(score_tmp_sym, enabled, tau): | |
| """Return True if the low-false-positive gate is passed. | |
| Currently uses the same logic as passes_template_gate, but kept as a separate | |
| entry point to distinguish template-based gating from low-FP gating. | |
| """ | |
| return passes_template_gate(score_tmp_sym, enabled, tau) |
| def _build_args_from_config(cfg: dict) -> list[str]: | ||
| args: list[str] = [] | ||
| for key, value in cfg.items(): |
There was a problem hiding this comment.
The new module CLI config path (_load_config + _build_args_from_config) introduces non-trivial behavior (YAML/JSON fallback, bool/list handling, key normalization). There doesn’t appear to be direct unit coverage for these helpers; consider adding focused tests (e.g., bool flags, list args, invalid top-level config type) to prevent regressions.
No description provided.