Skip to content

Codex/continual learning v2#10

Open
thinksyncs wants to merge 160 commits intomainfrom
codex/continual-learning-v2
Open

Codex/continual learning v2#10
thinksyncs wants to merge 160 commits intomainfrom
codex/continual-learning-v2

Conversation

@thinksyncs
Copy link
Contributor

No description provided.

akira added 28 commits February 8, 2026 17:37
# Conflicts:
#	tests/test_cuda_smoke_rtdetr_pose.py
# Conflicts:
#	tests/test_export_predictions_lora_cli.py
#	tests/test_refine_predictions_hessian_cli.py
#	tools/export_predictions.py
#	yolozu/adapter.py
# Conflicts:
#	yolozu/adapter.py
Copilot AI review requested due to automatic review settings February 9, 2026 14:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands YOLOZU with continual-learning helpers, richer prediction/dataset validation utilities, and additional CLI + tooling for evaluation/export workflows (incl. segmentation/keypoints and TRT parity), backed by a large set of new unit tests and schemas.

Changes:

  • Added new normalization/parsing utilities (keypoints, intrinsics, instance-seg predictions) and dataset validator helpers.
  • Extended tooling/CLIs for training/testing, baseline reporting, dataset preparation, and export/parity workflows.
  • Added extensive test coverage plus JSON schemas and documentation updates for the new contracts.

Reviewed changes

Copilot reviewed 169 out of 202 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
yolozu/keypoints.py Adds keypoints normalization/inference + pixel conversion helpers.
yolozu/intrinsics.py Adds robust intrinsics parsing from multiple OpenCV-friendly representations.
yolozu/instance_segmentation_predictions.py Adds instance-seg predictions normalization + validation + loader helpers.
yolozu/inference_utils.py Adds a convenience wrapper to apply constraints + template verification in one step.
yolozu/gates.py Adds low-FP gate function alongside existing template gate.
yolozu/datasets/ade20k.py Adds ADE20K dataset path resolution, sample iteration, and optional class-name loading.
yolozu/dataset_validator.py Adds dataset record validation with strict/warn modes and optional image checks.
yolozu/continual_metrics.py Adds continual-learning summary metrics computation for task×time matrices.
yolozu/cli.py Adds python -m yolozu CLI with train/test subcommands reading YAML/JSON configs.
yolozu/calibration/init.py Exposes calibration/refinement API from calibration subpackage.
yolozu/adapter.py Improves preprocessing: NumPy-based tensor conversion + preprocess metadata + intrinsics scaling.
yolozu/main.py Adds python -m yolozu entrypoint.
train_setting.yaml Adds default training config template for the new CLI.
tools/yolozu Adds shell wrapper to run module CLI.
tools/validate_segmentation_predictions.py Adds CLI tool to validate segmentation predictions shapes/contracts.
tools/validate_predictions.py Updates validator tool to normalize wrapped payloads and validate meta.
tools/validate_instance_segmentation_predictions.py Adds CLI tool to validate instance-seg predictions shapes/contracts.
tools/validate_dataset.py Adds CLI tool to validate YOLO dataset manifests/records.
tools/run_scenario_suite.py Adds CLI tool to generate a scenario-suite report JSON.
tools/run_baseline.py Expands baseline report schema (meta/speed/hash) and adds optional COCO mAP and scenario disabling.
tools/prepare_voc_seg.py Adds Pascal VOC segmentation dataset preparation tool (manifest/symlink/copy).
tools/prepare_ade20k_seg.py Adds ADE20K segmentation dataset preparation tool (manifest/symlink/copy).
tools/plot_metrics.py Adds simple JSONL metrics plotting utility.
tools/fetch_coco128_official.py Adds optional --insecure download mode and propagates it through fetch helpers.
tools/fetch_coco128.sh Adds env/CI-driven --insecure toggle when fetching coco128.
tools/export_predictions_trt.py Refactors TRT output resolution helper; adds stronger typing for predictions.
tools/export_predictions.py Switches to shared TTT preset application and improves rollback behavior for norm buffers.
tools/calibrate_predictions.py Adds CLI tool to calibrate predictions via L-BFGS and optionally wrap outputs with meta.
tools/build_trt_engine.py Adds helper to resolve TRT input name for compatibility/tests.
tools/benchmark_latency.py Adds FPS/latency targets and non-zero exit for failed target thresholds.
todo_yolo26_competition.md Updates checklist items to completed where features landed.
todo_pytorch_trt.md Updates checklist items to completed where features landed.
tests/test_yolozu_cli.py Adds CLI help assertions + instance-seg demo eval regression test.
tests/test_validate_segmentation_predictions_tool.py Adds tests for segmentation-predictions validator tool inputs.
tests/test_validate_instance_segmentation_predictions_tool.py Adds tests for instance-seg validator tool inputs.
tests/test_ttt_presets.py Adds unit tests for TTT preset auto-application and safety guard filling.
tests/test_ttt_integration.py Adds regression test ensuring rollback restores BN buffers when enabled.
tests/test_tta.py Adds TTA schema/determinism tests and norm-only update behavior verification.
tests/test_template_verification.py Adds minimal test ensuring top-k template verification annotates only selected detections.
tests/test_scenario_suite.py Strengthens scenario-suite report schema expectations.
tests/test_run_trt_workflow.py Adds tests for TRT workflow command construction.
tests/test_run_rtdetr_pose_backend_suite_cli.py Adds smoke test for backend suite dry-run artifact generation.
tests/test_run_baseline_report.py Adds smoke test for unified baseline report schema.
tests/test_rtdetr_pose_adapter.py Adds preprocess determinism/range tests and intrinsics scaling test.
tests/test_report_dependency_licenses_tool.py Adds smoke test for dependency license report tool.
tests/test_replay_buffer.py Adds reservoir replay-buffer behavior tests.
tests/test_refine_predictions_hessian_cli.py Updates refine CLI tests; adds bbox fields in test payload.
tests/test_prepare_voc_seg_tool.py Adds tests for VOC seg prep tool output structure.
tests/test_prepare_ade20k_seg_tool.py Adds tests for ADE20K seg prep tool output structure.
tests/test_predictions.py Adds tests for wrapped payload loading + meta validation contract.
tests/test_pose_eval.py Adds pose evaluation denominator behavior regression test.
tests/test_parity_trt_tool.py Adds tests for parity tool CLI arg building helper.
tests/test_onnx_parity_rtdetr_pose.py Adds PyTorch↔ONNXRuntime parity test for RTDETRPose outputs.
tests/test_make_subset_dataset.py Adds determinism test for subset dataset tool and hash output.
tests/test_intrinsics.py Adds tests for intrinsics parsing supported formats.
tests/test_inference_utils.py Adds test ensuring constraints + template verification integrate correctly.
tests/test_inference_constraints.py Adds tests for constraints inference including missing intrinsics path.
tests/test_gates_constraints.py Adds tests for low-FP gate behavior.
tests/test_export_predictions_lora_cli.py Minor cleanup in LoRA export test file (whitespace).
tests/test_eval_segmentation_tool.py Adds segmentation eval tool smoke test with ignore-index + HTML overlays.
tests/test_eval_keypoints_tool.py Adds keypoints eval tool smoke test for PCK output.
tests/test_eval_instance_segmentation_tool.py Adds instance-seg eval tool smoke test with HTML overlays.
tests/test_dataset_validator.py Adds tests for strict vs warn dataset validation behavior.
tests/test_dataset_keypoints.py Adds tests for parsing YOLO pose keypoints formats from labels.
tests/test_continual_metrics.py Adds tests for continual-metrics summary computation.
tests/test_coco_keypoints_eval.py Adds tests for keypoints COCO GT conversion + OKS eval dependency behavior.
tests/test_check_keypoints_parity_tool.py Adds tests for keypoints parity tool success/failure thresholds.
tests/test_calibrate_predictions_lbfgs.py Adds test ensuring LBFGS calibration recovers depth scale factor.
tests/test_build_trt_engine.py Adds tests for TRT input-name resolution helper.
tests/test_benchmark_keypoints_eval_tool.py Adds benchmark tool smoke test for keypoints evaluation.
test_setting.yaml Adds default test config template for the new CLI.
schemas/predictions.schema.json Adds JSON Schema for predictions contract (array/wrapper/mapping shapes).
rtdetr_pose/tests/test_train_minimal_stage_schedule.py Adds tests for stage-weight schedule in training.
rtdetr_pose/tests/test_train_minimal_mim_teacher.py Adds tests for MIM teacher fields in collate.
rtdetr_pose/tests/test_train_minimal_mim_schedule.py Adds tests for MIM schedule interpolation behavior.
rtdetr_pose/tests/test_train_minimal_mim_mask.py Adds tests that MIM mask can fully mask patches.
rtdetr_pose/tests/test_train_minimal_mask_labels.py Adds tests deriving labels from instance masks.
rtdetr_pose/tests/test_train_minimal_integration.py Adds integration tests for grad-accum, AMP behavior, combined features.
rtdetr_pose/tests/test_train_minimal_grad_accum_amp.py Adds parser tests for grad-accum and AMP flags.
rtdetr_pose/tests/test_train_minimal_denoise.py Adds test ensuring denoise targets append correctly.
rtdetr_pose/tests/test_train_minimal_cost_schedule.py Adds tests for staged matcher-cost schedule.
rtdetr_pose/tests/test_model_backbone_sppf.py Adds tests for SPPF toggle in backbone.
rtdetr_pose/tests/test_losses_masking.py Adds test for z-loss masking based on depth/mask availability.
rtdetr_pose/tests/test_lora.py Adds LoRA application/freezing/coverage tests.
rtdetr_pose/tests/test_hybrid_encoder_level_embed.py Adds tests for encoder level embedding toggle and forward shapes.
rtdetr_pose/tests/test_dataset_extract_pose_intrinsics.py Adds tests for OpenCV camera_matrix dict parsing in dataset targets.
rtdetr_pose/tests/test_dataset.py Adds skip when coco128 is missing to avoid hard failure.
rtdetr_pose/rtdetr_pose/validator.py Extends array loader to support .png masks as grayscale arrays.
rtdetr_pose/rtdetr_pose/losses.py Adds MIM reconstruction loss + entropy loss and integrates into total loss.
rtdetr_pose/rtdetr_pose/config.py Adds model config flags for SPPF and level embeddings.
rtdetr_pose/configs/base.json Enables new model config flags in default config JSON.
rtdetr_pose/README.md Documents continual runner and LoRA usage plus mask-only label notes.
requirements-test.txt Adds ONNXRuntime and onnxscript for parity tests.
pyproject.toml Adds Ruff config and per-file ignores for E402 in tools/tests.
examples/instance_seg_demo/predictions/instance_seg_predictions_rgbmask.json Adds demo predictions including RGB mask case for evaluator option.
examples/instance_seg_demo/predictions/instance_seg_predictions_noisy.json Adds demo predictions with low-score FPs for threshold demo.
examples/instance_seg_demo/predictions/instance_seg_predictions.json Adds base instance-seg demo predictions.
examples/instance_seg_demo/dataset/labels/val2017/demo_001.json Adds sidecar GT mask paths + classes for demo sample.
examples/instance_seg_demo/dataset/labels/val2017/demo_000.json Adds sidecar GT mask paths + classes for demo sample.
examples/instance_seg_demo/classes.txt Adds class list for instance-seg demo.
examples/instance_seg_demo/README.md Adds runnable documentation for instance-seg demo + evaluator options.
docs/yolozu_spec.md Adds feature summary spec doc for repo capabilities and contracts.
docs/yolo26_baseline_repro.md Documents TRT engine build steps for reproducible baseline flow.
docs/tools_index.md Extends tool index with keypoints/instance-seg eval and policy helpers.
docs/tensorrt_pipeline.md Adds backend suite command to the TRT pipeline docs.
docs/schemas/segmentation_predictions.schema.json Adds schema for segmentation predictions contract (wrapper/list/mapping).
docs/schemas/seg_eval_report.schema.json Adds schema for segmentation eval report output.
docs/schemas/seg_dataset.schema.json Adds schema for segmentation dataset descriptor outputs.
docs/schemas/instance_segmentation_predictions.schema.json Adds schema for instance-seg predictions contract.
docs/schemas/instance_seg_eval_report.schema.json Adds schema for instance-seg eval report output.
docs/real_model_interface.md Updates training command examples and adds units/intrinsics guidance.
docs/predictions_schema.md Adds v1 predictions schema doc including units/intrinsics rules.
docs/onnx_export_parity.md Extends parity docs with TRT parity/workflow helper usage.
docs/license_policy.md Adds dependency license reporting guidance.
docs/continual_learning.md Adds continual learning guide (memoryless/replay/LoRA) and evaluation flow.
docs/adapter_contract.md Adds adapter contract doc including optional TTT hooks.
deploy/runpod/run_compliance.sh Adds script for generating doctor + dependency licenses artifacts in container.
deploy/runpod/README.md Updates container tag naming and documents compliance artifact generation.
configs/continual/rtdetr_pose_domain_inc_example.yaml Adds example continual config (domain incremental) with replay+distill options.
SECURITY_SUMMARY.md Adds security scan summary document for consolidation PR.
PR12_FIXES_SUMMARY.md Adds summary of fixed issues from PR#12 review.
NOTICE Adds NOTICE file for project attribution.
LINT_FIXES_NEEDED.md Adds lint-fix tracking doc for open PRs.
.github/workflows/ci.yml Adds scenario smoke run step to CI.
.beads/config.yaml Updates beads config to include multi-repo paths.
.beads/.gitignore Updates beads ignore patterns and internal staging ignores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +27
return data or {}
except Exception:
return simple_yaml_load(text)
if path.suffix.lower() == ".json":
return json.loads(text)
try:
return json.loads(text)
except Exception:
return simple_yaml_load(text)


Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_config() is annotated/used as returning a dict, but yaml.safe_load() / json.loads() can legally return non-dict top-level values (e.g., a list). That will break _build_args_from_config(cfg) which assumes cfg.items(). Fix by validating data is a dict (else raise a clear SystemExit/ValueError), and ensure all branches return a dict.

Suggested change
return data or {}
except Exception:
return simple_yaml_load(text)
if path.suffix.lower() == ".json":
return json.loads(text)
try:
return json.loads(text)
except Exception:
return simple_yaml_load(text)
except Exception:
data = simple_yaml_load(text)
elif path.suffix.lower() == ".json":
data = json.loads(text)
else:
try:
data = json.loads(text)
except Exception:
data = simple_yaml_load(text)
if data is None:
return {}
if not isinstance(data, dict):
raise SystemExit(
f"Top-level configuration in {path} must be a mapping/object, "
f"got {type(data).__name__} instead."
)
return data

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +58
if len(nums) % 3 == 0:
out = []
for i in range(0, len(nums), 3):
out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]})
return out
if len(nums) % 2 == 0:
out = []
for i in range(0, len(nums), 2):
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flat-list parsing is ambiguous when len(nums) is divisible by both 2 and 3 (e.g., 6, 12): the function will always choose the % 3 path and interpret values as (x,y,v) triples even if the caller intended (x,y) pairs. Consider disambiguating by (a) preferring % 2 unless a has_visibility/stride parameter is explicitly provided, or (b) using a heuristic (e.g., all v values are in an expected range like {0,1,2}) before selecting the triple interpretation.

Suggested change
if len(nums) % 3 == 0:
out = []
for i in range(0, len(nums), 3):
out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]})
return out
if len(nums) % 2 == 0:
out = []
for i in range(0, len(nums), 2):
def _looks_like_visibility_triplets(nums_list: list[float]) -> bool:
"""Heuristic: treat as (x, y, v) only if all v are typical visibility flags."""
if len(nums_list) < 3:
return False
vs = nums_list[2::3]
# Common visibility flags are 0, 1, or 2 (COCO/YOLO-style).
allowed = {0.0, 1.0, 2.0}
return all(v in allowed for v in vs)
n = len(nums)
# Prefer triples when length is only divisible by 3, or when ambiguous but v-values look like visibilities.
if n % 3 == 0 and (n % 2 != 0 or _looks_like_visibility_triplets(nums)):
out = []
for i in range(0, n, 3):
out.append({"x": float(nums[i]), "y": float(nums[i + 1]), "v": nums[i + 2]})
return out
if n % 2 == 0:
out = []
for i in range(0, n, 2):

Copilot uses AI. Check for mistakes.
Comment on lines +4 to +7


def _is_number(value: Any) -> bool:
return isinstance(value, (int, float)) and not isinstance(value, bool)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_is_number() rejects common numeric scalar types like numpy.float32 / numpy.int64 (and other numbers.Real implementers). That makes normalize_keypoints() unnecessarily brittle for data coming from NumPy/Torch conversions. Consider switching to isinstance(value, numbers.Real) (still excluding bool) to accept non-builtin numeric scalars without adding a NumPy dependency.

Suggested change
def _is_number(value: Any) -> bool:
return isinstance(value, (int, float)) and not isinstance(value, bool)
import numbers
def _is_number(value: Any) -> bool:
return isinstance(value, numbers.Real) and not isinstance(value, bool)

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +124
if mask is not None:
# Apply mask: compute loss only on masked locations
mask_expanded = mask.unsqueeze(0).unsqueeze(0) if mask.ndim == 2 else mask.unsqueeze(1)
mask_expanded = mask_expanded.expand_as(diff).to(dtype=torch.bool)

if not mask_expanded.any():
return diff.sum() * 0.0

return diff[mask_expanded].mean()
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When mask is on a different device than diff (e.g., CPU mask with CUDA features), diff[mask_expanded] will raise a device mismatch error. Fix by moving the expanded mask to diff.device (and converting to bool) before indexing.

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +46
"bbox": {"cx": 0.5, "cy": 0.5, "w": 0.2, "h": 0.2},
# Target a depth value to force offsets refinement along +x (depth(u)=u).
"log_z": math.log(4.0),
"rot6d": [1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
"offsets": [0.0, 0.0],
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test hunk appears to introduce duplicated keys in the same detection dict (log_z, rot6d, offsets were already present earlier in the same literal). In Python, later duplicates silently override earlier ones, which can hide test intent and make the payload inconsistent. Remove the duplicated entries (keep a single definition of each key) so the fixture is unambiguous.

Suggested change
"bbox": {"cx": 0.5, "cy": 0.5, "w": 0.2, "h": 0.2},
# Target a depth value to force offsets refinement along +x (depth(u)=u).
"log_z": math.log(4.0),
"rot6d": [1.0, 0.0, 0.0, 0.0, 1.0, 0.0],
"offsets": [0.0, 0.0],

Copilot uses AI. Check for mistakes.
Comment on lines +15 to +20
INSECURE_FLAG=""
if [[ "${YOLOZU_INSECURE_SSL:-}" == "1" || "${CI:-}" == "true" ]]; then
INSECURE_FLAG="--insecure"
fi

python3 "$REPO_ROOT/tools/fetch_coco128_official.py" --out "$OUT_DIR" $INSECURE_FLAG
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enables --insecure (disables SSL verification) automatically when CI=true. That weakens transport security by default in CI and could mask real certificate/URL issues. Consider requiring an explicit opt-in (only YOLOZU_INSECURE_SSL=1) and/or printing a prominent warning when insecure mode is enabled.

Copilot uses AI. Check for mistakes.
run: |
python -m unittest

- name: Scenario smoke run
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI now depends on a network download of coco128 at runtime, which can introduce flakiness, slowdowns, and failures in restricted environments. Consider caching the dataset artifact, vendoring a tiny deterministic test dataset, or making this step conditional (e.g., scheduled/nightly or continue-on-error) so core CI remains reliable.

Suggested change
- name: Scenario smoke run
- name: Scenario smoke run
continue-on-error: true

Copilot uses AI. Check for mistakes.
Comment on lines 9 to +17
if not enabled:
return True
return score_tmp_sym >= tau


def passes_low_fp_gate(score_tmp_sym, enabled, tau):
if not enabled:
return True
return score_tmp_sym >= tau
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passes_low_fp_gate() is currently identical to passes_template_gate(), which duplicates logic and increases maintenance cost. Consider implementing it as an alias/wrapper (or a single shared gate helper) and add a short docstring clarifying the semantic difference (template-vs-low-FP) if both names are kept for readability.

Suggested change
if not enabled:
return True
return score_tmp_sym >= tau
def passes_low_fp_gate(score_tmp_sym, enabled, tau):
if not enabled:
return True
return score_tmp_sym >= tau
"""Return True if the template-based score passes the configured threshold gate."""
if not enabled:
return True
return score_tmp_sym >= tau
def passes_low_fp_gate(score_tmp_sym, enabled, tau):
"""Return True if the low-false-positive gate is passed.
Currently uses the same logic as passes_template_gate, but kept as a separate
entry point to distinguish template-based gating from low-FP gating.
"""
return passes_template_gate(score_tmp_sym, enabled, tau)

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +30
def _build_args_from_config(cfg: dict) -> list[str]:
args: list[str] = []
for key, value in cfg.items():
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new module CLI config path (_load_config + _build_args_from_config) introduces non-trivial behavior (YAML/JSON fallback, bool/list handling, key normalization). There doesn’t appear to be direct unit coverage for these helpers; consider adding focused tests (e.g., bool flags, list args, invalid top-level config type) to prevent regressions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants