diff --git a/README.ko.md b/README.ko.md index 100791d..3372795 100644 --- a/README.ko.md +++ b/README.ko.md @@ -75,7 +75,7 @@ Runtime은 Forge `agent_manifest.json`을 선택적으로 읽어 기존 Lab-comp 이 기능은 reliable edge agent runtime 방향의 첫 Runtime-side contract입니다. `agent_id`, `task_id`, `agent_type`, priority, latency budget, queue wait, fallback usage, telemetry context를 기록하지만 기존 `result.json`의 top-level compare/report 필드는 변경하지 않습니다. -Runtime result JSON에는 `runtime_health_snapshot`, `runtime_error_classification`, `runtime_events`, `runtime_operation_summary`도 additive evidence로 기록됩니다. 이제 health snapshot은 backend availability, latency budget/deadline observation, tegrastats evidence availability와 `health_reason`을 함께 남기고, runtime events는 sequential `event_index`를 가진 lifecycle trace로 기록됩니다. `runtime_operation_summary`는 Lab/Orchestrator/AIGuard handoff용 compact index로 `risk_labels`, `evidence_gaps`, retryability, conservative `recommended_action`을 남기되 `decision_owner: lab`, `scheduler_owner: orchestrator`, `production_cancellation: false`를 유지합니다. `runtime_telemetry.coverage`는 expected / observed / missing telemetry fields를 기록하고 `comparability_owner: edgeenv`, `missing_telemetry_is_failure: false`를 명시합니다. `--timeout-ms`는 latency timeout 관측 기준을 남기는 옵션이며, production request cancellation을 의미하지 않습니다. 실행이 `skipped`로 끝나면 Runtime은 `runtime_execution_skipped`, `retryable: true`, `retry_hint: check_backend_availability`를 남겨 Lab/Orchestrator가 failure handling evidence로 해석할 수 있게 합니다. +Runtime result JSON에는 `runtime_health_snapshot`, `runtime_error_classification`, `runtime_events`, `runtime_operation_summary`도 additive evidence로 기록됩니다. 이제 health snapshot은 backend availability, latency budget/deadline observation, tegrastats evidence availability와 `health_reason`을 함께 남기고, runtime events는 sequential `event_index`를 가진 lifecycle trace로 기록됩니다. `runtime_operation_summary`는 Lab/Orchestrator/AIGuard handoff용 compact index로 `risk_labels`, `evidence_gaps`, retryability, conservative `recommended_action`을 남기되 `decision_owner: lab`, `scheduler_owner: orchestrator`, `production_cancellation: false`를 유지합니다. `runtime_telemetry.coverage`는 expected / observed / missing telemetry fields를 기록하고 `comparability_owner: edgeenv`, `missing_telemetry_is_failure: false`를 명시합니다. `runtime_telemetry.history_seed`는 `registry_owner: edgeenv`, `decision_owner: lab`, `production_monitoring: false`를 유지하며 EdgeEnv telemetry history accumulation으로 넘길 수 있는 single-result replay point를 제공합니다. `--timeout-ms`는 latency timeout 관측 기준을 남기는 옵션이며, production request cancellation을 의미하지 않습니다. 실행이 `skipped`로 끝나면 Runtime은 `runtime_execution_skipped`, `retryable: true`, `retry_hint: check_backend_availability`를 남겨 Lab/Orchestrator가 failure handling evidence로 해석할 수 있게 합니다. 예시: diff --git a/README.md b/README.md index c12cf0c..ef10d61 100644 --- a/README.md +++ b/README.md @@ -488,7 +488,7 @@ This is the first bridge toward the reliable edge agent runtime direction. It re Runtime result JSON also includes additive operation evidence blocks: - `runtime_health_snapshot`: execution health, backend/device context, backend availability, run count, latency/FPS summary, latency-budget/deadline observation, tegrastats evidence availability, and explicit timeout observation status. `--timeout-ms` records an observation threshold; it does not claim production request cancellation. -- `runtime_telemetry`: single-result telemetry seed for Runtime Intelligence history/replay. It records timestamp, execution sequence id, latency rolling seed values, power mode, tegrastats-derived resource evidence when available, operation flags, and explicit `missing_fields` for telemetry that the current device/run did not provide. This is local-first evidence, not a production monitoring stream. +- `runtime_telemetry`: single-result telemetry seed for Runtime Intelligence history/replay. It records timestamp, execution sequence id, latency rolling seed values, power mode, tegrastats-derived resource evidence when available, operation flags, and explicit `missing_fields` for telemetry that the current device/run did not provide. The additive `history_seed` block packages the same single-result evidence as a one-point replay seed for EdgeEnv telemetry history accumulation. This is local-first evidence, not a production monitoring stream. - `runtime_error_classification`: structured success/error category, severity, retryability, retry hint, observed mean latency, and timeout budget for downstream report context. Skipped execution is recorded as `runtime_execution_skipped` with `retry_hint: check_backend_availability` so Lab/Orchestrator can explain runtime failure handling without treating Runtime as a worker daemon. - `runtime_events`: compact indexed lifecycle event log for configuration, benchmark completion, error classification, optional agent context, telemetry recording, operation summary, and tegrastats parsing. - `runtime_operation_summary`: compact handoff index for Lab/Orchestrator/AIGuard with `health_reason`, `risk_labels`, `evidence_gaps`, retryability, and a conservative `recommended_action`. It keeps `decision_owner: lab`, `scheduler_owner: orchestrator`, and `production_cancellation: false`. @@ -501,6 +501,7 @@ Runtime Intelligence boundary: - `collection_mode` starts as `single_result_export`; EdgeEnv owns telemetry history accumulation and comparability-first regression. - Missing device telemetry remains explicit in `missing_fields` instead of being fabricated. - `runtime_telemetry.coverage` records expected / observed / missing telemetry fields, with `comparability_owner: edgeenv` and `missing_telemetry_is_failure: false`. +- `runtime_telemetry.history_seed` uses `inferedge-runtime-telemetry-history-seed-v1`, keeps `registry_owner: edgeenv`, `decision_owner: lab`, `production_monitoring: false`, and exposes a single replay point that EdgeEnv can later accumulate into a local telemetry history. - Runtime exports telemetry evidence only. AIGuard may turn it into deterministic anomaly evidence, and Lab remains the deployment decision owner. The committed fixture diff --git a/docs/agent_runtime_result_contract.md b/docs/agent_runtime_result_contract.md index 61227cc..0888450 100644 --- a/docs/agent_runtime_result_contract.md +++ b/docs/agent_runtime_result_contract.md @@ -257,6 +257,7 @@ When provided, Runtime appends: - `runtime_operation_summary` is an additive handoff index for Lab/Orchestrator/AIGuard. It repeats the health reason, retryability, risk labels, evidence gaps, and a conservative `recommended_action` without making the deployment decision itself. - `runtime_operation_summary.decision_owner` must remain `lab`, and `scheduler_owner` must remain `orchestrator`. - `runtime_operation_summary.production_cancellation` is always `false`; Runtime records observations only. +- `runtime_telemetry.history_seed` is an additive `inferedge-runtime-telemetry-history-seed-v1` block for EdgeEnv telemetry history/replay. It keeps `registry_owner: edgeenv`, `decision_owner: lab`, `production_monitoring: false`, and one single-result telemetry point so downstream tools can accumulate history without Runtime becoming a telemetry store. - Runtime does not claim production request cancellation. `--timeout-ms` is an observation threshold: if a successful benchmark mean latency exceeds the configured threshold, Runtime records `timeout_observed: true`, `runtime_error_classification.category: runtime_timeout_observed`, and `retryable: true` for downstream reliability reporting. - If execution is skipped because Runtime cannot complete the configured benchmark, Runtime records `runtime_error_classification.category: runtime_execution_skipped`, `severity: warning`, `retryable: true`, and `retry_hint: check_backend_availability`. This is failure-handling evidence for Lab/Orchestrator reporting, not a production worker retry loop. - Without `--timeout-ms`, results record `timeout_policy: not_configured`, `timeout_budget_ms: null`, and `timeout_observed: false`. diff --git a/scripts/smoke_default.sh b/scripts/smoke_default.sh index 305ae24..afa13e0 100755 --- a/scripts/smoke_default.sh +++ b/scripts/smoke_default.sh @@ -106,6 +106,22 @@ assert "queue_depth" in coverage["expected_fields"], coverage assert "queue_depth" in coverage["missing_fields"], coverage assert "telemetry_timestamp" in coverage["observed_fields"], coverage assert coverage["missing_fields"] == telemetry["missing_fields"], coverage +history_seed = telemetry["history_seed"] +assert history_seed["schema_version"] == "inferedge-runtime-telemetry-history-seed-v1", history_seed +assert history_seed["registry_owner"] == "edgeenv", history_seed +assert history_seed["decision_owner"] == "lab", history_seed +assert history_seed["source_telemetry_schema_version"] == telemetry["schema_version"], history_seed +assert history_seed["production_monitoring"] is False, history_seed +assert history_seed["missing_telemetry_is_failure"] is False, history_seed +assert history_seed["replay_ready"] is True, history_seed +assert "compare_key" in history_seed["recommended_registry_key_fields"], history_seed +assert "latency.mean_ms" in history_seed["time_series_fields"], history_seed +assert history_seed["source_result"]["compare_key"] == data["compare_key"], history_seed +assert history_seed["source_result"]["backend_key"] == data["backend_key"], history_seed +assert history_seed["points"][0]["telemetry_timestamp"] == telemetry["telemetry_timestamp"], history_seed +assert history_seed["points"][0]["execution_sequence_id"] == telemetry["execution_sequence_id"], history_seed +assert history_seed["points"][0]["mean_ms"] == telemetry["latency"]["mean_ms"], history_seed +assert history_seed["points"][0]["timeout_observed"] == telemetry["operation"]["timeout_observed"], history_seed assert events["runtime_telemetry_recorded"]["observed_field_count"] == coverage["observed_field_count"] assert events["runtime_telemetry_recorded"]["missing_field_count"] == coverage["missing_field_count"] assert events["runtime_telemetry_recorded"]["schema"] == "inferedge-runtime-telemetry-v1" @@ -178,6 +194,12 @@ coverage = telemetry["coverage"] assert coverage["schema_version"] == "inferedge-runtime-telemetry-coverage-v1", coverage assert coverage["comparability_owner"] == "edgeenv", coverage assert coverage["missing_fields"] == telemetry["missing_fields"], coverage +history_seed = telemetry["history_seed"] +assert history_seed["registry_owner"] == "edgeenv", history_seed +assert history_seed["decision_owner"] == "lab", history_seed +assert history_seed["source_result"]["compare_key"] == data["compare_key"], history_seed +assert history_seed["points"][0]["p99_ms"] == telemetry["latency"]["p99_ms"], history_seed +assert history_seed["points"][0]["deadline_missed"] == telemetry["operation"]["deadline_missed"], history_seed assert "runtime_telemetry_recorded" in events, events assert data["extra"]["agent_manifest_recorded"] is True PY diff --git a/src/result_writer.cpp b/src/result_writer.cpp index 79400df..2fe1386 100644 --- a/src/result_writer.cpp +++ b/src/result_writer.cpp @@ -560,6 +560,122 @@ void write_runtime_telemetry_coverage_json( << indent << "}"; } +void write_runtime_telemetry_history_seed_json( + std::ostream& output, + const RuntimeConfig& config, + const EngineMetadata& engine_metadata, + const BenchmarkResult& benchmark_result, + const TegrastatsSummary& tegrastats_summary, + const std::string& timestamp, + int indent_spaces) { + const std::string indent(static_cast(indent_spaces), ' '); + const bool has_tegrastats = tegrastats_summary.status == "parsed"; + const std::string precision = config.manifest_precision.empty() ? "fp32" : config.manifest_precision; + output + << "{\n" + << indent << " \"schema_version\": \"inferedge-runtime-telemetry-history-seed-v1\",\n" + << indent << " \"evidence_role\": \"runtime_telemetry_history_seed\",\n" + << indent << " \"registry_owner\": \"edgeenv\",\n" + << indent << " \"decision_owner\": \"lab\",\n" + << indent << " \"source_result_schema_version\": \"inferedge-runtime-result-v1\",\n" + << indent << " \"source_telemetry_schema_version\": \"inferedge-runtime-telemetry-v1\",\n" + << indent << " \"replay_scope\": \"single_result_to_history\",\n" + << indent << " \"replay_ready\": true,\n" + << indent << " \"production_monitoring\": false,\n" + << indent << " \"missing_telemetry_is_failure\": false,\n" + << indent << " \"source_result\": {\n" + << indent << " \"compare_key\": " << json_string(make_compare_key(config)) << ",\n" + << indent << " \"backend_key\": " << json_string(make_backend_key(engine_metadata, config)) << ",\n" + << indent << " \"engine_backend\": " << json_string(engine_metadata.backend) << ",\n" + << indent << " \"device\": " << json_string(config.device) << ",\n" + << indent << " \"precision\": " << json_string(precision) << ",\n" + << indent << " \"power_mode\": " << json_string(config.power_mode) << "\n" + << indent << " },\n" + << indent << " \"recommended_registry_key_fields\": "; + write_string_array_json(output, { + "compare_key", + "backend_key", + "device", + "precision", + "power_mode", + "run_config", + }); + output + << ",\n" + << indent << " \"time_series_fields\": "; + write_string_array_json(output, { + "telemetry_timestamp", + "execution_sequence_id", + "latency.mean_ms", + "latency.p95_ms", + "latency.p99_ms", + "latency.fps", + "latency.inference_interval_ms", + "latency.rolling_latency_mean_ms", + "latency.rolling_latency_std_ms", + "resource.ram_used_mb", + "resource.max_temperature_c", + "resource.vdd_in_mw_avg", + "operation.queue_depth", + "operation.runtime_uptime_sec", + "operation.timeout_observed", + "operation.latency_budget_exceeded", + "operation.deadline_missed", + }); + output + << ",\n" + << indent << " \"points\": [\n" + << indent << " {\n" + << indent << " \"execution_sequence_id\": 0,\n" + << indent << " \"telemetry_timestamp\": " << json_string(timestamp) << ",\n" + << indent << " \"mean_ms\": " << benchmark_result.mean_ms << ",\n" + << indent << " \"p95_ms\": " << benchmark_result.p95_ms << ",\n" + << indent << " \"p99_ms\": " << benchmark_result.p99_ms << ",\n" + << indent << " \"fps\": " << benchmark_result.fps << ",\n" + << indent << " \"inference_interval_ms\": " << benchmark_result.mean_ms << ",\n" + << indent << " \"rolling_latency_mean_ms\": " << benchmark_result.mean_ms << ",\n" + << indent << " \"rolling_latency_std_ms\": " << benchmark_result.std_ms << ",\n" + << indent << " \"ram_used_mb\": "; + if (has_tegrastats) { + output << tegrastats_summary.ram_used_mb_max; + } else { + output << "null"; + } + output + << ",\n" + << indent << " \"max_temperature_c\": "; + if (has_tegrastats) { + output << tegrastats_summary.max_temp_c; + } else { + output << "null"; + } + output + << ",\n" + << indent << " \"vdd_in_mw_avg\": "; + if (has_tegrastats) { + output << tegrastats_summary.vdd_in_mw_avg; + } else { + output << "null"; + } + output + << ",\n" + << indent << " \"queue_depth\": null,\n" + << indent << " \"runtime_uptime_sec\": null,\n" + << indent << " \"timeout_observed\": " + << (timeout_observed(config, benchmark_result) ? "true" : "false") << ",\n" + << indent << " \"latency_budget_exceeded\": " + << (latency_budget_exceeded(config, benchmark_result) ? "true" : "false") << ",\n" + << indent << " \"deadline_missed\": " + << (should_mark_deadline_missed(config, benchmark_result) ? "true" : "false") << ",\n" + << indent << " \"power_mode\": " << json_string(config.power_mode) << ",\n" + << indent << " \"telemetry_source\": " + << json_string(has_tegrastats ? "tegrastats" : "not_available") << ",\n" + << indent << " \"tegrastats_status\": " << json_string(tegrastats_summary.status) << "\n" + << indent << " }\n" + << indent << " ]\n" + << indent << "}"; +} + std::string runtime_operation_recommended_action( const RuntimeConfig& config, const EngineMetadata& engine_metadata, @@ -788,6 +904,17 @@ void write_runtime_telemetry_json( << ",\n" << indent << " \"coverage\": "; write_runtime_telemetry_coverage_json(output, tegrastats_summary, indent_spaces + 2); + output + << ",\n" + << indent << " \"history_seed\": "; + write_runtime_telemetry_history_seed_json( + output, + config, + engine_metadata, + benchmark_result, + tegrastats_summary, + timestamp, + indent_spaces + 2); output << ",\n" << indent << " \"production_monitoring\": false\n" diff --git a/tests/test_agent_runtime_result_contract.py b/tests/test_agent_runtime_result_contract.py index 8e41fdb..3229938 100644 --- a/tests/test_agent_runtime_result_contract.py +++ b/tests/test_agent_runtime_result_contract.py @@ -203,6 +203,39 @@ def test_runtime_output_records_optional_agent_block_when_manifest_is_provided(s coverage["missing_field_count"], len(coverage["missing_fields"]), ) + history_seed = telemetry["history_seed"] + self.assertEqual( + history_seed["schema_version"], + "inferedge-runtime-telemetry-history-seed-v1", + ) + self.assertEqual(history_seed["registry_owner"], "edgeenv") + self.assertEqual(history_seed["decision_owner"], "lab") + self.assertEqual( + history_seed["source_telemetry_schema_version"], + telemetry["schema_version"], + ) + self.assertFalse(history_seed["production_monitoring"]) + self.assertFalse(history_seed["missing_telemetry_is_failure"]) + self.assertTrue(history_seed["replay_ready"]) + self.assertIn("compare_key", history_seed["recommended_registry_key_fields"]) + self.assertIn("latency.mean_ms", history_seed["time_series_fields"]) + self.assertEqual( + history_seed["source_result"]["compare_key"], + result["compare_key"], + ) + self.assertEqual( + history_seed["source_result"]["backend_key"], + result["backend_key"], + ) + self.assertEqual(history_seed["source_result"]["precision"], result["precision"]) + self.assertEqual(history_seed["source_result"]["power_mode"], result["run_config"]["power_mode"]) + point = history_seed["points"][0] + self.assertEqual(point["execution_sequence_id"], telemetry["execution_sequence_id"]) + self.assertEqual(point["telemetry_timestamp"], telemetry["telemetry_timestamp"]) + self.assertEqual(point["mean_ms"], telemetry["latency"]["mean_ms"]) + self.assertEqual(point["p99_ms"], telemetry["latency"]["p99_ms"]) + self.assertEqual(point["timeout_observed"], telemetry["operation"]["timeout_observed"]) + self.assertEqual(point["deadline_missed"], telemetry["operation"]["deadline_missed"]) extra = result["extra"] self.assertTrue(extra["agent_manifest_recorded"]) diff --git a/tests/test_lab_result_schema.py b/tests/test_lab_result_schema.py index 4ed2a8b..c3f08a0 100644 --- a/tests/test_lab_result_schema.py +++ b/tests/test_lab_result_schema.py @@ -375,6 +375,88 @@ def validate_optional_runtime_telemetry(result: dict) -> None: if expected not in coverage["expected_fields"]: raise AssertionError(f"runtime_telemetry.coverage.expected_fields missing {expected}") + history_seed = telemetry.get("history_seed") + if history_seed is not None: + validate_runtime_telemetry_history_seed(history_seed, telemetry) + + +def validate_runtime_telemetry_history_seed(history_seed: dict, telemetry: dict) -> None: + if not isinstance(history_seed, dict): + raise AssertionError("runtime_telemetry.history_seed must be an object when present") + for field in ( + "schema_version", + "evidence_role", + "registry_owner", + "decision_owner", + "source_result_schema_version", + "source_telemetry_schema_version", + "replay_scope", + ): + if not isinstance(history_seed.get(field), str): + raise AssertionError(f"runtime_telemetry.history_seed.{field} must be a string") + if history_seed["schema_version"] != "inferedge-runtime-telemetry-history-seed-v1": + raise AssertionError("runtime_telemetry.history_seed.schema_version is invalid") + if history_seed["registry_owner"] != "edgeenv": + raise AssertionError("runtime_telemetry.history_seed.registry_owner must be edgeenv") + if history_seed["decision_owner"] != "lab": + raise AssertionError("runtime_telemetry.history_seed.decision_owner must be lab") + if history_seed["source_telemetry_schema_version"] != telemetry["schema_version"]: + raise AssertionError("runtime_telemetry.history_seed source telemetry schema mismatch") + for field in ("replay_ready", "production_monitoring", "missing_telemetry_is_failure"): + if not isinstance(history_seed.get(field), bool): + raise AssertionError(f"runtime_telemetry.history_seed.{field} must be a boolean") + if history_seed["production_monitoring"] is not False: + raise AssertionError("runtime_telemetry.history_seed.production_monitoring must be false") + if history_seed["missing_telemetry_is_failure"] is not False: + raise AssertionError("runtime_telemetry.history_seed.missing_telemetry_is_failure must be false") + for field in ("recommended_registry_key_fields", "time_series_fields"): + values = history_seed.get(field) + if not isinstance(values, list) or not all(isinstance(item, str) for item in values): + raise AssertionError(f"runtime_telemetry.history_seed.{field} must be a string array") + for expected in ("compare_key", "backend_key", "precision", "power_mode"): + if expected not in history_seed["recommended_registry_key_fields"]: + raise AssertionError(f"runtime_telemetry.history_seed registry key fields missing {expected}") + for expected in ( + "telemetry_timestamp", + "execution_sequence_id", + "latency.mean_ms", + "operation.timeout_observed", + ): + if expected not in history_seed["time_series_fields"]: + raise AssertionError(f"runtime_telemetry.history_seed time series fields missing {expected}") + + source_result = history_seed.get("source_result") + if not isinstance(source_result, dict): + raise AssertionError("runtime_telemetry.history_seed.source_result must be an object") + for field in ("compare_key", "backend_key", "engine_backend", "device", "precision", "power_mode"): + if not isinstance(source_result.get(field), str): + raise AssertionError(f"runtime_telemetry.history_seed.source_result.{field} must be a string") + + points = history_seed.get("points") + if not isinstance(points, list) or not points: + raise AssertionError("runtime_telemetry.history_seed.points must be a non-empty array") + first_point = points[0] + if not isinstance(first_point, dict): + raise AssertionError("runtime_telemetry.history_seed.points[] must be an object") + if first_point.get("execution_sequence_id") != telemetry["execution_sequence_id"]: + raise AssertionError("runtime_telemetry.history_seed point sequence id mismatch") + if first_point.get("telemetry_timestamp") != telemetry["telemetry_timestamp"]: + raise AssertionError("runtime_telemetry.history_seed point timestamp mismatch") + for point_field, telemetry_value in ( + ("mean_ms", telemetry["latency"]["mean_ms"]), + ("p95_ms", telemetry["latency"]["p95_ms"]), + ("p99_ms", telemetry["latency"]["p99_ms"]), + ("fps", telemetry["latency"]["fps"]), + ("inference_interval_ms", telemetry["latency"]["inference_interval_ms"]), + ("rolling_latency_mean_ms", telemetry["latency"]["rolling_latency_mean_ms"]), + ("rolling_latency_std_ms", telemetry["latency"]["rolling_latency_std_ms"]), + ): + if first_point.get(point_field) != telemetry_value: + raise AssertionError(f"runtime_telemetry.history_seed point {point_field} mismatch") + for field in ("timeout_observed", "latency_budget_exceeded", "deadline_missed"): + if first_point.get(field) != telemetry["operation"][field]: + raise AssertionError(f"runtime_telemetry.history_seed point {field} mismatch") + class JetsonEvidenceContractTest(unittest.TestCase): def test_runtime_binary_parses_tegrastats_log_when_available(self):